问题使用read.csv函数read.table和R中读取数据
问题描述:
我具有被从MySQL使用下面的命令中导出的数据,问题使用read.csv函数read.table和R中读取数据
SELECT
id_code,info_text INTO OUTFILE '/tmp/company-desc.csv'
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM
dx_company WHERE LENGTH(id_code) = 8 AND
id_code REGEXP '^[0-9]+$';
但是当我尝试使用下面的命令来加载CSV R,
dt.companydesc <- read.csv("company-desc.csv",sep=';',fill=T, encoding = "UTF-8",quote="\n",header=FALSE)
或
dt.companydesc <- read.csv("company-desc.csv",sep=';',fill=T, encoding = "UTF-8",quote="\"",header=FALSE)
它yeilds类似的结果:
Id code description
2345 This is the description \n344555 \n737384 \n388383 \n000083
某些id与说明混在一起。 它在阅读时基本上存在引号和\ n问题。如果我试图让我干扰整个桌子。 我也试过gsub和readLines。 任何帮助。
的快照:(CSV文件)
"1000004";"general"
"1000000";"licensed version, and products"
"1000007";""
"1000003";""
"1000002";""
"1000006";""
"1000002";"automobiles; well organised"
所需的输出:
Id_code Description
1000004 general
1000000 licensed version, and products
1000007 NA
1000003 NA
1000002 NA
1000006 NA
1000002 automobiles and industry; well organised
答
这里是一种使用data.table::fread
,这也是快:
require(data.table) # v1.9.6+
fread(' "1000004";"general"
"1000000";"licensed version, and products"
"1000007";""
"1000003";""
"1000002";""
"1000006";""
"1000002";"automobiles; well organised"', na.strings="",
header=FALSE, col.names=c("Id_code", "Description"))
# Id_code Description
# 1: 1000004 general
# 2: 1000000 licensed version, and products
# 3: 1000007 NA
# 4: 1000003 NA
# 5: 1000002 NA
# 6: 1000006 NA
# 7: 1000002 automobiles; well organised
沿后一个例子预期产出。 –
我的猜测是你的'quote'参数不正确,但我不能确定没有看到CSV文件的示例。 – Benjamin
'quote =“\ n”'是一种看不见的东西。你对MySQL说分隔符是逗号,然后当你调用'read.csv'时使用';'。你确定吗? – nicola