通过逗号分隔CSV导入 - 如何处理引号?

问题描述:

我有一个CSV文件,我正在导入,但遇到了问题。该数据的格式为:通过逗号分隔CSV导入 - 如何处理引号?

TEST 690, "This is a test 1, 2 and 3" ,$14.95 ,4

我需要能够通过爆炸,不在引号内...

+0

有一件事我会尝试,如果你可以改变输入文件,使一切都在报价,然后你可以用'爆炸“”'脱下第一个和最后报价之后。这样,它不会立即引爆引号旁边的逗号。当然,只有当你不想像Artefacto所建议的那样使用'fgetcsv'时,并且你想用它挑战自己。 – 2010-05-14 15:21:20

+0

我无法将所有内容都用引号括起来,而是通过另一个系统导出。 – Webnet 2010-05-14 15:25:53

+0

引号只能在第二个字段上? – Armstrongest 2010-05-14 15:30:30

fgetcsv功能。

如果你已经有了一个字符串,可以创建一个包装它流,然后用fgetcsv。见http://code.google.com/p/phpstringstream/source/browse/trunk/stringstream.php

+0

我宁愿使用正则表达式因为这里有特殊功能 – Webnet 2010-05-14 15:24:52

+6

不要用正则表达式。这并不像看起来那么简单。您可能会在字符串中出现换行符。你可能已经逃脱了角色。 – Artefacto 2010-05-14 15:32:35

+2

一旦CSV被解析(通过fgetscsv),您可以正确处理每个单独的字段到您的心脏的内容。 – Roadmaster 2010-05-14 15:36:34

如果你真的想用手做这一点,这里有一个粗略的参考实现我写的CSV文本的完整产品线爆炸到一个数组。 被警告:此代码不处理多行字段!有了这个实现,整个CSV行必须存在于一行中,不会换行!

<?php 
//----------------------------------------------------------------------- 
function csvexplode($str, $delim = ',', $qual = "\"") 
// Explode a single CSV string (line) into an array. 
{ 
    $len = strlen($str); // Store the complete length of the string for easy reference. 
    $inside = false; // Maintain state when we're inside quoted elements. 
    $lastWasDelim = false; // Maintain state if we just started a new element. 
    $word = ''; // Accumulator for current element. 

    for($i = 0; $i < $len; ++$i) 
    { 
     // We're outside a quoted element, and the current char is a field delimiter. 
     if(!$inside && $str[$i]==$delim) 
     { 
      $out[] = $word; 
      $word = ''; 
      $lastWasDelim = true; 
     } 

     // We're inside a quoted element, the current char is a qualifier, and the next char is a qualifier. 
     elseif($inside && $str[$i]==$qual && ($i<$len && $str[$i+1]==$qual)) 
     { 
      $word .= $qual; // Add one qual into the element, 
      ++$i; // Then skip ahead to the next non-qual char. 
     } 

     // The current char is a qualifier (so we're either entering or leaving a quoted element.) 
     elseif ($str[$i] == $qual) 
     { 
      $inside = !$inside; 
     } 

     // We're outside a quoted element, the current char is whitespace and the 'last' char was a delimiter. 
     elseif(!$inside && ($str[$i]==" ") && $lastWasDelim) 
     { 
      // Just skip the char because it's leading whitespace in front of an element. 
     } 

     // Outside a quoted element, the current char is whitespace, the "next" char is a delimiter. 
     elseif(!$inside && ($str[$i]==" ") ) 
     { 
      // Look ahead for the next non-whitespace char. 
      $lookAhead = $i+1; 
      while(($lookAhead < $len) && ($str[$lookAhead] == " ")) 
      { 
       ++$lookAhead; 
      } 

      // If the next char is formatting, we're dealing with trailing whitespace. 
      if($str[$lookAhead] == $delim || $str[$lookAhead] == $qual) 
      { 
       $i = $lookAhead-1; // Jump the pointer ahead to right before the delimiter or qualifier. 
      } 

      // Otherwise we're still in the middle of an element, so add the whitespace to the output. 
      else 
      { 
       $word .= $str[$i]; 
      } 
     } 

     // If all else fails, add the character to the current element. 
     else 
     { 
      $word .= $str[$i]; 
      $lastWasDelim = false; 
     } 
    } 

    $out[] = $word; 
    return $out; 
} 


// Examples: 

$csvInput = 'Name,Address,Phone 
Alice,123 First Street,"555-555-5555" 
Bob,"345 Second Place, City ST",666-666-6666 
"Charlie ""Chuck"" Doe", 3rd Circle ," 777-777-7777"'; 

// explode() emulates file() in this context. 
foreach(explode("\n", $csvInput) as $line) 
{ 
    var_dump(csvexplode($line)); 
} 
?> 

虽然我仍然建议依靠PHP的内置函数。这(希望)将长期更可靠。 Artefacto和路线大师是正确的:任何你必须做的数据是最好的做法你导入。