通过CSV文件将其存储为具有嵌套数组值的字典。最佳方法?

问题描述:

我想把这个csv文件解析并存储在一个字典的形式(抱歉,如果我使用的术语不正确,我目前正在学习)。第一个元素是我的键,其余的将是嵌套数组形式的值。通过CSV文件将其存储为具有嵌套数组值的字典。最佳方法?

targets_value,11.4,10.5,10,10.8,8.3,10.1,10.7,13.1 
targets,Cbf1,Sfp1,Ino2,Opi1,Cst6,Stp1,Met31,Ino4 
one,"9.6,6.3,7.9,11.4,5.5",N,"8.4,8.1,8.1,8.4,5.9,5.9",5.4,5.1,"8.1,8.3",N,N 
two,"7.0,11.4,7.0","4.8,5.3,7.0,8.1,9.0,6.1,4.6,5.0,4.6","6.3,5.9,5.9",N,"4.3,4.8",N,N,N 
three,"6.0,9.7,11.4,6.8",N,"11.8,6.3,5.9,5.9,9.5","5.4,8.4","5.1,5.1,4.3,4.8,5.1",N,N,11.8 
four,"9.7,11.4,11.4,11.4",4.6,"6.2,7.9,5.9,5.9,6.3","5.6,5.5","4.8,4.8,8.3,5.1,4.3",N,7.9,N 
five,7.9,N,"8.1,8.4",N,"4.3,8.3,4.3,4.3",N,N,N 
six,"5.7,11.4,9.7,5.5,9.7,9.7","4.4,7.0,7.7,7.5,6.9,4.9,4.6,4.9,4.6","7.9,5.9,5.9,5.9,5.9,6.3",6.7,"5.1,4.8",N,7.9,N 
seven,"6.3,11.4","5.2,4.7","6.3,6.0",N,"8.3,4.3,4.8,4.3,5.1","9.8,9.5",N,8.4 
eight,"11.4,11.4,5.9","4.4,6.3,6.0,5.6,7.6,7.1,5.1,5.3,5.1,4.9","6.3,6.3,5.9,5.9,6.6,6.6","5.3,5.2,7.0","8.3,4.3,4.3,4.8,4.3,4.3,8.3,4.8,8.3,5.1","9.2,7.4","9.4,9.3,7.9",N 
nine,"9.7,9.7,11.4,9.7","5.2,4.6,5.5,6.5,4.5,4.6,5.5","6.3,5.9,5.9,9.5,6.5",N,"4.3,5.1,8.3,8.3,4.3,4.3,4.3,4.8",8.0,8.6,N 
ten,"9.7,9.7,9.7,11.4,7.9","5.2,4.6,5.5,6.5,4.5,4.6,5.5","6.3,5.9,5.9,9.5,6.5",5.7,"4.3,4.3,4.3,5.1,8.3,8.3,4.3,4.3,4.3,4.8",8.0,8.6,N 
YPL250C_Icy2,"11.4,6.1,11.4",N,"6.3,6.0,6.6,7.0,10.0,6.5,9.5,7.0,10.0",7.1,"4.3,4.3",9.2,"10.7,9.5",N 
,,,,,,,, 
,,,,,,,, 

的问题是,在每一行,一些列是因为每个小区的多个值的报价,有的只有一个条目,但没有报价。并且没有值输入的单元格用N插入。由于有引号和非引号以及数字和非数字的混合。

通缉的输出是这个样子:

{'eight': ['11.4,11.4,5.9', '4.4,6.3,6.0,5.6,7.6,7.1,5.1,5.3,5.1,4.9', '6.3,6.3,5.9,5.9,6.6,6.6', '5.3,5.2,7.0', '8.3,4.3,4.3,4.8,4.3,4.3,8.3,4.8,8.3,5.1', '9.2,7.4', '9.4,9.3,7.9', 'N'], 

'ten': ['9.7,9.7,9.7,11.4,7.9', '5.2,4.6,5.5,6.5,4.5,4.6,5.5', '6.3,5.9,5.9,9.5,6.5', '5.7', '4.3,4.3,4.3,5.1,8.3,8.3,4.3,4.3,4.3,4.8', '8.0', '8.6', 'N'], 

'nine': ['9.7,9.7,11.4,9.7', '5.2,4.6,5.5,6.5,4.5,4.6,5.5', '6.3,5.9,5.9,9.5,6.5', 'N', '4.3,5.1,8.3,8.3,4.3,4.3,4.3,4.8', '8.0', '8.6', 'N'] 
} 

我写了一个脚本来清理并储存起来,但不知道我的剧本是“太长时间没有理由”。有小费吗?

motif_dict = {} 
with open(filename, "r") as file: 
    data = file.readlines() 
    for line in data: 
     if ',,,,,,,,' in line: 
      continue 
     else: 
      quoted_holder = re.findall(r'"(\d.*?\d)"' , line) 
      #reverses the order of the elements contained in the array 
      quoted_holder = quoted_holder[::-1] 
      new_line = re.sub(r'"\d.*?\d"', 'h', line).split(',') 
      for position,element in enumerate(new_line): 
       if element == 'h': 
        new_line[position] = quoted_holder.pop() 
     motif_dict[new_line[0]] = new_line[1:] 

有一个csv模块,它使得使用csv文件更容易。在你的情况,你的代码变得

import csv 

with open("motif.csv","r",newline="") as fp: 
    reader = csv.reader(fp) 
    data = {row[0]: row[1:] for row in reader if row and row[0]} 

其中if row and row[0]让我们跳过这是空行或有一个空的第一个元素。这将产生(添加新行)

>>> data["eight"] 
['11.4,11.4,5.9', '4.4,6.3,6.0,5.6,7.6,7.1,5.1,5.3,5.1,4.9', 
'6.3,6.3,5.9,5.9,6.6,6.6', '5.3,5.2,7.0', 
'8.3,4.3,4.3,4.8,4.3,4.3,8.3,4.8,8.3,5.1', 
'9.2,7.4', '9.4,9.3,7.9', 'N'] 
>>> data["ten"] 
['9.7,9.7,9.7,11.4,7.9', '5.2,4.6,5.5,6.5,4.5,4.6,5.5', 
'6.3,5.9,5.9,9.5,6.5', '5.7', '4.3,4.3,4.3,5.1,8.3,8.3,4.3,4.3,4.3,4.8', 
'8.0', '8.6', 'N'] 

在实践中,处理,我想你不会想和无或一些其它物体作为丢失标志替换“N”,使每个值花车的列表(即使它只有一个元素),但这取决于你。

+0

*罢工表*这让我想起了Raymond Hettinger的一个例子,它很美。 – melwil

+0

非常感谢你!我不知道这个模块是否存在,会为我节省很多时间。对不起,我还有一个问题,是否可以使用模块将值设置为元组而不是数组?这样我就可以对它们进行计算而不用担心匹配错误的元素,因为每个元素都有8个元素对应另一个8值列表。 –

+0

哇,谢谢melwil。找到他的“转换成美丽的,习惯Python”视频的代码。真棒,它的美丽。 –