通过逗号分隔CSV导入 - 如何处理引号?
我有一个CSV文件,我正在导入,但遇到了问题。该数据的格式为:通过逗号分隔CSV导入 - 如何处理引号?
TEST 690, "This is a test 1, 2 and 3" ,$14.95 ,4
我需要能够通过爆炸,不在引号内...
见fgetcsv功能。
如果你已经有了一个字符串,可以创建一个包装它流,然后用fgetcsv
。见http://code.google.com/p/phpstringstream/source/browse/trunk/stringstream.php
我宁愿使用正则表达式因为这里有特殊功能 – Webnet 2010-05-14 15:24:52
不要用正则表达式。这并不像看起来那么简单。您可能会在字符串中出现换行符。你可能已经逃脱了角色。 – Artefacto 2010-05-14 15:32:35
一旦CSV被解析(通过fgetscsv),您可以正确处理每个单独的字段到您的心脏的内容。 – Roadmaster 2010-05-14 15:36:34
如果你真的想用手做这一点,这里有一个粗略的参考实现我写的CSV文本的完整产品线爆炸到一个数组。 被警告:此代码不处理多行字段!有了这个实现,整个CSV行必须存在于一行中,不会换行!
<?php
//-----------------------------------------------------------------------
function csvexplode($str, $delim = ',', $qual = "\"")
// Explode a single CSV string (line) into an array.
{
$len = strlen($str); // Store the complete length of the string for easy reference.
$inside = false; // Maintain state when we're inside quoted elements.
$lastWasDelim = false; // Maintain state if we just started a new element.
$word = ''; // Accumulator for current element.
for($i = 0; $i < $len; ++$i)
{
// We're outside a quoted element, and the current char is a field delimiter.
if(!$inside && $str[$i]==$delim)
{
$out[] = $word;
$word = '';
$lastWasDelim = true;
}
// We're inside a quoted element, the current char is a qualifier, and the next char is a qualifier.
elseif($inside && $str[$i]==$qual && ($i<$len && $str[$i+1]==$qual))
{
$word .= $qual; // Add one qual into the element,
++$i; // Then skip ahead to the next non-qual char.
}
// The current char is a qualifier (so we're either entering or leaving a quoted element.)
elseif ($str[$i] == $qual)
{
$inside = !$inside;
}
// We're outside a quoted element, the current char is whitespace and the 'last' char was a delimiter.
elseif(!$inside && ($str[$i]==" ") && $lastWasDelim)
{
// Just skip the char because it's leading whitespace in front of an element.
}
// Outside a quoted element, the current char is whitespace, the "next" char is a delimiter.
elseif(!$inside && ($str[$i]==" ") )
{
// Look ahead for the next non-whitespace char.
$lookAhead = $i+1;
while(($lookAhead < $len) && ($str[$lookAhead] == " "))
{
++$lookAhead;
}
// If the next char is formatting, we're dealing with trailing whitespace.
if($str[$lookAhead] == $delim || $str[$lookAhead] == $qual)
{
$i = $lookAhead-1; // Jump the pointer ahead to right before the delimiter or qualifier.
}
// Otherwise we're still in the middle of an element, so add the whitespace to the output.
else
{
$word .= $str[$i];
}
}
// If all else fails, add the character to the current element.
else
{
$word .= $str[$i];
$lastWasDelim = false;
}
}
$out[] = $word;
return $out;
}
// Examples:
$csvInput = 'Name,Address,Phone
Alice,123 First Street,"555-555-5555"
Bob,"345 Second Place, City ST",666-666-6666
"Charlie ""Chuck"" Doe", 3rd Circle ," 777-777-7777"';
// explode() emulates file() in this context.
foreach(explode("\n", $csvInput) as $line)
{
var_dump(csvexplode($line));
}
?>
虽然我仍然建议依靠PHP的内置函数。这(希望)将长期更可靠。 Artefacto和路线大师是正确的:任何你必须做的数据是最好的做法后你导入。
有一件事我会尝试,如果你可以改变输入文件,使一切都在报价,然后你可以用'爆炸“”'脱下第一个和最后报价之后。这样,它不会立即引爆引号旁边的逗号。当然,只有当你不想像Artefacto所建议的那样使用'fgetcsv'时,并且你想用它挑战自己。 – 2010-05-14 15:21:20
我无法将所有内容都用引号括起来,而是通过另一个系统导出。 – Webnet 2010-05-14 15:25:53
引号只能在第二个字段上? – Armstrongest 2010-05-14 15:30:30