解析CSV文件并汇总值
我想解析CSV文件并汇总值。市行已重复值(样本):解析CSV文件并汇总值
CITY,AMOUNT
London,20
Tokyo,45
London,55
New York,25
解析结果后应该是这样的:
CITY, AMOUNT
London,75
Tokyo,45
New York,25
我已经写了下面的代码提取独特的城市名称:
def main():
contrib_data = list(csv.DictReader(open('contributions.csv','rU')))
combined = []
for row in contrib_data:
if row['OFFICE'] not in combined:
combined.append(row['OFFICE'])
我该如何汇总数值?
测试在Python 3.2.2:
import csv
from collections import defaultdict
reader = csv.DictReader(open('test.csv', newline=''))
cities = defaultdict(int)
for row in reader:
cities[row["CITY"]] += int(row["AMOUNT"])
writer = csv.writer(open('out.csv', 'w', newline = ''))
writer.writerow(["CITY", "AMOUNT"])
writer.writerows([city, cities[city]] for city in cities)
结果:
CITY,AMOUNT
New York,25
London,75
Tokyo,45
至于你额外的要求:
import csv
from collections import defaultdict
def default_factory():
return [0, None, None, 0]
reader = csv.DictReader(open('test.csv', newline=''))
cities = defaultdict(default_factory)
for row in reader:
amount = int(row["AMOUNT"])
cities[row["CITY"]][0] += amount
max = cities[row["CITY"]][1]
cities[row["CITY"]][1] = amount if max is None else amount if amount > max else max
min = cities[row["CITY"]][2]
cities[row["CITY"]][2] = amount if min is None else amount if amount < min else min
cities[row["CITY"]][3] += 1
for city in cities:
cities[city][3] = cities[city][0]/cities[city][3] # calculate mean
writer = csv.writer(open('out.csv', 'w', newline = ''))
writer.writerow(["CITY", "AMOUNT", "max", "min", "mean"])
writer.writerows([city] + cities[city] for city in cities)
这给你
CITY,AMOUNT,max,min,mean
New York,25,25,25,25.0
London,75,55,20,37.5
Tokyo,45,45,45,45.0
请注意,在Python 2下,您需要在顶部增加一行from __future__ import division
才能获得正确的结果。
在Python 2.7上测试了它,它工作正常。我想知道为什么一个普通的Dict不工作,为什么我必须使用defaultdict()? – jwesonga 2012-01-10 08:35:29
'defaultdict(int)'允许您使用尚未定义的键并自动为其赋值'0'。所以你可以做'城市[“博尔顿”] + = 10',如果键已经存在,它可以创建一个新的键''博尔顿'',值为'10'或者加上'10'。如果你用普通的'dict'来做这件事,你会得到很多'KeyError'。 – 2012-01-10 08:39:20
感谢您的反馈。是否有可能使MAX,MIN和MEAN值在同一个循环中? – jwesonga 2012-01-10 09:03:29
将值与AMOUNT一起使用的dict可能会诀窍。像以下各项
一些假设你一次读取一行,并city
指示当前城市和amount
表示电流量 -
main_dict = {}
---for loop here---
if city in main_dict:
main_dict[city] = main_dict[city] + amount
else:
main_dict[city] = amount
---end for loop---
在循环结束时,您将有main_dict
合计值。
提示:使用字典而不是列表。城市作为关键,总和(金额)作为价值 – 2012-01-10 08:03:25