将csv文件加载到Hive表

问题描述:

我有一个csv文件,它有这样的内容。将csv文件加载到Hive表

"DepartmentID","Name","GroupName","ModifiedDate" 
"1","Engineering","Research and Development","2008-04-30 00:00:00" 

create external table if not exists AdventureWorks2014.Department 
( 
    DepartmentID smallint , 
    Name string , 
    GroupName string, 
    rate_code string, 
    ModifiedDate timestamp 
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' lines terminated by '\n' 
STORED AS TEXTFILE LOCATION 'wasb:///ds/Department' TBLPROPERTIES('skip.header.line.count'='1');` 

和装载数据

LOAD DATA INPATH 'wasb:///ds/Department.csv' INTO TABLE AdventureWorks2014.Department; 

没有加载的数据之后。

select * from AdventureWorks2014.Department; 

上面的select不返回任何内容。

我认为每个文件的双引号是问题。有没有办法从这样的文件加载数据到配置单元表,而不必去掉双引号?

+0

你不应该把'双quotes'了'smallint'类型。使用'1'而不是''1'' – Venky

+0

因此,如果我将表定义DepartmentID smallint更改为DepartmentID字符串将修复问题?我会尝试。双引号是使用SSIS提取过程的结果。 – user5299

FIELDS TERMINATED BY'“,”'不正确。您的字段以“,”结尾。将您的DDL更改为'''FIELDS TERMINATED'。

LOAD DATA LOCAL INPATH“/home/hadoop/hive/log_2013805_16210.log'into table_name的

试试这个(手机......)

create external table if not exists AdventureWorks2014.Department (DepartmentID smallint , Name string , GroupName string, rate_code string, ModifiedDate timestamp) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'  
STORED AS TEXTFILE 
LOCATION 'wasb:///ds/Department' 

** **限制
SerDe将所有列都视为String类型。即使使用此SerDe创建包含非字符串列类型的表,DESCRIBE TABLE输出也会显示字符串列类型。类型信息从SerDe中检索。要将列转换为表中所需的类型,可以在表中创建一个视图,将CAST转换为所需的类型。

https://cwiki.apache.org/confluence/display/Hive/CSV+Serde