如何在s3中从csv文件读取数据并在aws athena中创建表格时跳过标题。

问题描述：

我正在尝试读取s3存储桶中的csv数据并在AWS Athena中创建表。我的表创建时无法跳过我的CSV文件的标题信息。如何在s3中从csv文件读取数据并在aws athena中创建表格时跳过标题。

查询示例：

CREATE EXTERNAL TABLE IF NOT EXISTS table_name ( `event_type_id` 
    string, `customer_id` string, `date` string, `email` string) 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
    WITH 
    SERDEPROPERTIES ( "separatorChar" = "|", "quoteChar"  = "\"") 
    LOCATION 's3://location/' 
    TBLPROPERTIES ("skip.header.line.count"="1");

skip.header.line.count似乎并没有工作。但这并不奏效。我认为Aws在这方面有一些问题。是否有其他方法可以解决这个问题？

答

这是红移是什么在起作用：

你想用table properties ('skip.header.line.count'='1') 随着其他的属性，如果你想，例如'numRows'='100'。下面是一个示例：

create external table exreddb1.test_table 
(ID BIGINT 
,NAME VARCHAR 
) 
row format delimited 
fields terminated by ',' 
stored as textfile 
location 's3://mybucket/myfolder/' 
table properties ('numRows'='100', 'skip.header.line.count'='1');

这里的AWS红移SQL文件上的“创建外部表”，http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_TABLE.html – TheWalkingData

答

这是一个已知的缺陷。

我见过的最好的方法是tweeted by Eric Hammond：

...WHERE date NOT LIKE '#%'

这似乎是在查询过程中跳过标题行。我不确定它是如何工作的，但它可能是一种跳过NULL的方法。

如何在s3中从csv文件读取数据并在aws athena中创建表格时跳过标题。

相关推荐