Python的CSV阅读器和迭代

问题描述:

我有一个CSV文件看起来像这样:Python的CSV阅读器和迭代

"Company, Inc.",,,,,,,,,,,,10/30/09 
A/R Summary Aged Analysis Report,,,,,,,,,,,,10:35:01 
All Clients,,,,,,,,,,,,USER 

Client Account,Customer Name,15-Jan,16 - 30,31 - 60,61 - 90,91 - 120,120 - Over,Total,Status,Credit Limit 
1000001111,CLIENT A,0,0,"3,711.32",0,0,"18,629.64","22,340.96",COD,"20,000.00" 
1000002222,CLIENT B,0,0,0,"3,591.27",0,0,"3,591.27",COD,0 
1000003333,CLIENT C,536.78,0,0,0,0,"11,216.60","11,753.38",COD,0 
1000004444,CLIENT D,0,514.94,"3,147.45",690,0,0,"4,352.39",COD,0 

Grand Total,,"139,203,856.06","84,607,749.30","110,746,640.18","58,474,379.45","52,025,869.06","292,653,734.82","737,712,228.87",,,, 

但我只想要处理的行之后的行“客户账户...”和“总计之前.. ”这就是我现在使用的代码:

inputFile = csv.reader(open(filename), dialect='excel') 
records = [line for line in inputFile if line and line[0].isdigit()] 
+1

这是有效的。问题是什么? – 2010-01-07 11:28:50

import re 
import StringIO 

data=re.search("Client Account[^\r\n]+[\r\n]+(.*)(?=Grand Total)",open(filename).read(),re.DOTALL).group(1) 
datafile=StringIO.StringIO(data) 

inputFile = csv.reader(datafile, dialect='excel') 
records = [line for line in inputFile if line and line[0].isdigit()] 
+0

我喜欢你的方法,它快速简单。如何将数据文件的内容转换为列表? – Francis 2010-01-07 17:44:18

+0

当我尝试了你的建议时得到了这个错误信息:“TypeError:强制为Unicode:需要字符串或缓冲区,找到实例” – Francis 2010-01-07 17:50:49

+0

对于延迟抱歉,open(datafile)应该是'datafile'而已,其已经是文件实例,更新。 – YOU 2010-01-08 00:38:19

,你可以像这样做,通过设置标志

import csv 
file = "file" 
f=0 
reader = csv.reader(open(file),delimiter=',') 
for row in reader: 
    if "Grand Total" in row: break 
    if "Client Account" in row: f=1;continue 
    if f: 
     if row[0].isdigit(): 
      print row 
+0

修改 - if "Grand Total" in row: break和,我认为你的继续将跳回到“排在读者”,从不处理任何东西。 – KevinDTimm 2010-01-07 11:06:34

+0

我有一个非常相似的问题,我的“总计”行并不总是“格兰特总计”,它可能是其他领域,但总是有一个空白行。我如何通过确定空白行来打破循环? – LWZ 2013-01-13 20:35:03

使用一个漂亮的小发生器来做这样的事情。这一个可以推广多一点,如果你的需求改变:

def lines_between(source, first, second): 
    for line in source: 
     if line and line[0] == first: 
      break 

    for line in source: 
     if line: and line[0] == second: 
      break 

     if line: # only non-empty lines 
      yield line 

for record in lines_between(inputFile, 'Client Account', 'Grand Total'): 
    # process record 

你没有明确要求“非空行”过滤器,但你自己的方法在做这个,所以我想你想这样做。如果你不想“懒洋洋地”处理行这样,但只是想一切都建立在预先列表,这样做:

records = list(lines_between(inputFile, 'Client Account', 'Grand Total')) 

顺便说一句,在Windows上,一定要开真正的源文件使用二进制模式,使用csv.reader(open(filename, 'rb'), dialect='excel')作为csv docs note

通过发电机。您可以从简单的生成器过滤器函数中构建各种复杂性。虽然比过滤器复杂得多,但它更具可扩展性,并且可以轻松处理真正复杂的电子表格。

def skip_blank(rdr): 
    for row in rdr: 
     if len(row) == 0: continue 
     if all(len(col)==0 for col in row): continue 
     yield row 

def after_heading(text, rdr): 
    i= iter(rdr) 
    for row in i: 
     if any(column == text for column in row): 
      break 
    for row in i: 
     yield row 

def before_footing(text, rdr): 
    for row in rdr: 
     if any(column == text for column in row): 
      break 
     yield row 

def between(start, end, rdr): 
    for row in before_footing(end, after_heading(start, rdr)): 
     yield row 

for row in between('Grand Total', 'Client Account', skip_blank(inputFile)): 
    print row