从斯卡拉JAVA对象创建火花数据集,spark 1.6
问题描述:
我在外部jar中设置了pojos,我想从这些对象中创建Dataset。 如果我从Scala案例类创建数据集,那么我可以根据期望创建数据集。 如果我试图做与JAVA对象相同,它将一列中的所有数据作为一个对象。从斯卡拉JAVA对象创建火花数据集,spark 1.6
case class patientDiagnosis(patientId: Long, visitId: Long, diagnosisCode: String, isPrimaryDiagnosis: String, patientDiagnosisId: Long, sourceSystemUniqueIdentifier: String, diagnosisCodeSystem: String) {}
println("case Dataset from scala object :")
joinDf.as[patientDiagnosis].show()
OUTPUT:
case Dataset from scala object :
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+
|patientId|visitId|diagnosisCode|isPrimaryDiagnosis|patientDiagnosisId|sourceSystemUniqueIdentifier|diagnosisCodeSystem|
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+
| 1388158|1764555| 296.20| 1| 1247383| 1247383| ICD9|
+---------+-------+-------------+------------------+------------------+----------------------------+-------------------+
当我试图做到这一点在Java中,给出以下的输出:
JAVA Object:
public class PatientDiagnosis implements Serializable{
private static final long serialVersionUID = -7971192387675901350L;
private long patientId;
private long visitId;
private String diagnosisCode;
private String isPrimaryDiagnosis;
private long patientDiagnosisId;
private String sourceSystemUniqueIdentifier;
private int isDeleted;
private String diagnosisCodeSystem;
}
scala code:
import sqlContext.implicits._
val p:Encoder[com....PatientDiagnosis] = Encoders.bean(classOf[com....PatientDiagnosis])
println("case Java Encoder :")
joinDiagnf.as[com....PatientDiagnosis](p).show(false)
OUTPUT:
case Java Encoder :
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+
|diagnosisCode |diagnosisCodeSystem|isDeleted|isPrimaryDiagnosis|patientDiagnosisId|patientId|sourceSystemUniqueIdentifier|visitId|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+
|PatientDiagnosis [patientId=0, visitId=1764555, diagnosisCode=296.20, isPrimaryDiagnosis=1, patientDiagnosisId=1247383, sourceSystemUniqueIdentifier=1247383, isDeleted=0, diagnosisCodeSystem=ICD9]|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------+---------+------------------+------------------+---------+----------------------------+-------+
我做任何语法错误或不被支持斯卡拉火花来创建JAVA对象数据集1.6版本。
答
对不起我的错误,它给出正确的输出。 我以前没有得到这个,因为dataset.show视图没有给出正确的解释。 当我选择特定列时,这些列具有所需的值。
'joinDiagnf'的模式是什么? –
与每个对象相同 – Kalpesh