与NA的字符串比较在If语句中失败
问题描述:
我正在使用Python 2.7将我的表格数据转换为矩阵,我正在做一些分析以及我正在检查单元是否具有NA(它是R输出并且我将NAs用于缺失的数据点)。如果细胞有NA,我不做任何分析,只是传递给另一个。与NA的字符串比较在If语句中失败
但它适用于其中的一些(前三行),但它不适用于第四行。该值也是NA,我正在以相同的方式检查它。
代码:
从CMD的这个特定代码def findMax(l, i):
r = []
for x in range(0, 3):
if not l[i] == "NA": # Problem
print l[i]
if float(l[i]) <= 15:
if not l[i-1] == "NA":
if float(l[i-1]) <= 0.05:
if not l[i-2] == "NA":
r.append(float(l[i-2]))
i = i+12
if len(r) != 0:
return max(r)
else:
return 0
fIn = open("D:/projects/salmon/rawData_full.csv", "r")
fOut = open("D:/projects/salmon/dataAsMatrix.txt", "w")
fOut.write("Prot"+"\t"+"2 min"+"\t"+"5 min"+"\t"+"10 min"+"\t"+"20 min"+"\n")
for line in fIn:
cols = line.split(";");
if cols[6] != "NA":
hgnc_symbol = cols[6];
vals = [findMax(cols, 9), findMax(cols, 12), findMax(cols, 15), findMax(cols, 18)]
m = max(vals)
if m != 0:
mi = [i for i, j in enumerate(vals) if j == m] # Problem
if mi == [0]:
fOut.write(hgnc_symbol+"\t"+"1"+"\t"+"0"+"\t"+"0"+"\t"+"0"+"\n")
elif mi == [1]:
fOut.write(hgnc_symbol+"\t"+"0"+"\t"+"1"+"\t"+"0"+"\t"+"0"+"\n")
elif mi == [2]:
fOut.write(hgnc_symbol+"\t"+"0"+"\t"+"0"+"\t"+"1"+"\t"+"0"+"\n")
elif mi == [3]:
fOut.write(hgnc_symbol+"\t"+"0"+"\t"+"0"+"\t"+"0"+"\t"+"1"+"\n")
fIn.close()
fOut.close()
输出:
D:\projects\salmon>python processDataAsMatrix.py
17.278
16.37
13.072
11.251
23.81
4.3903
8.284
22.255
5.9456
25.727
15.511
13.448
18.857
17.056
15.106
33.84
3.9582
5.4985
18.857
17.056
15.106
33.84
3.9582
5.4985
NA
Traceback (most recent call last):
File "processDataAsMatrix.py", line 29, in <module>
vals = [findMax(cols, 9), findMax(cols, 12), findMax(cols, 15), findMax(cols
, 18)]
File "processDataAsMatrix.py", line 8, in findMax
if float(l[i]) <= 15:
ValueError: could not convert string to float: NA
表:
1st row: ZYX 0.030963842 0.44073 17.278 0.026328939 0.34735 11.251 -0.020729408 0.40571 8.284 0.12169113 0.047 25.727 -0.038389092 0.23603 16.37 -0.028881936 0.39508 23.81 0.017909396 0.41499 22.255 0.258158193 0.021821 15.511 -0.01200769 0.33594 13.072 0.049101678 0.34596 43.903 0.019365575 0.44196 59.456 0.157124196 0.19583 13.448
2nd row: ZYX 0.046846204 0.31797 18.857 0.146097014 0.0034837 15.106 0.221048912 0.0011114 33.84 0.492229415 3.61e-07 39.582 NA NA NA NA NA NA NA NA NA NA NA NA 0.011612729 0.49258 17.056 -0.076600534 0.071586 NA 0.371141778 7.49e-05 NA 0.507383556 0.0017682 54.985
3rd row: ZYX 0.046846204 0.32115 18.857 0.146097014 0.0032917 15.106 0.221048912 0.00099106 33.84 0.492229415 2.27e-07 39.582 NA NA NA NA NA NA NA NA NA NA NA NA 0.011612729 0.49293 17.056 -0.128999496 0.01102 NA 0.220709405 0.011875 NA 0.507383556 0.0017682 54.985
4th row: ZYX NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
答
由于球员的帮助,repr
表明各行包含\n
所以我只需要这样做:line = line.rstrip()
。现在,它的工作。
是否有可能字符串值有空格(例如'“NA”'或其他)? – BrenBarn 2014-09-19 07:50:53
尝试使用'repr(l [i])打印值'并查看字符串中的实际内容。 – 2014-09-19 07:55:46
你真的应该学习[布尔操作](https://docs.python.org/2/reference/expressions.html#boolean-operations),而不是嵌套深入五层的if。 – 2014-09-19 07:58:31