在python2.7.11中,为什么我不能删除fileopen代码?

问题描述:

.txt文件保持数据如下(源:在2章here “datingTestSet2.txt”):在python2.7.11中,为什么我不能删除fileopen代码?

40920 8.326976 0.953952 largeDoses 
14488 7.153469 1.673904 smallDoses 
26052 1.441871 0.805124 didntLike 
75136 13.147394 0.428964 didntLike 
38344 1.669788 0.134296 didntLike 
... 

代码:

from numpy import * 
import operator 
from os import listdir 

def file2matrix(filename): 
    fr = open(filename) 
    # arr = fr.readlines() # Code1!!!!!!!!!!!!!!!!!!! 
    numberOfLines = len(fr.readlines())  #get the number of lines in the file 
    returnMat = zeros((numberOfLines,3))  #prepare matrix to return 
    classLabelVector = []      #prepare labels return 
    fr = open(filename) # Code2!!!!!!!!!!!!!!!!!!!!! 
    index = 0 
    for line in fr.readlines(): 
     line = line.strip() 
     listFromLine = line.split('\t') 
     returnMat[index,:] = listFromLine[0:3] 
     classLabelVector.append(int(listFromLine[-1])) 
     index += 1 
    return returnMat,classLabelVector 

datingDataMat, datingLabels = file2matrix('datingTestSet2.txt') 

此函数的结果是:

 datingDataMat     datingLabels 
40920 8.326976 0.953952   3 
14488 7.153469 1.673904   2 
26052 1.441871 0.805124   1 
75136 13.147394 0.428964   1 
38344 1.669788 0.134296   1 
72993 10.141740 1.032955   1 
35948 6.830792 1.213192   3 
42666 13.276369 0.543880   3 
67497 8.631577 0.749278   1 
35483 12.273169 1.508053   3 
50242 3.723498 0.831917   1 
...  ...   ...    ... 

我的问题是:

  1. 当我刚刚删除Code2(fr = open(filename),其中index = 0以上), 函数的结果变成全零矩阵,并且全零矢量。 为什么我不能删除Code2?不第一行(fr = open(filename)工作?

  2. 当我刚刚加入代码1(arr = fr.readlines()),这是错误的。为什么???

    returnMat[index,:] = listFromLine[0:3] 
    
    IndexError: index 0 is out of bounds for axis 0 with size 0 
    

1)无法删除由于此行的Code2行:

numberOfLines = len(fr.readlines())  #get the number of lines in the file 

在该行中,您正在阅读文件的末尾。再次打开它可以让你在文件的开始处...

2)类似于上面的答案,如果你调用readLines()来读取所有行并将文件光标移动到文件...因此,如果您再次尝试在文件上读取文件,则没有任何可读的文件,因此失败。

您目前位于文件的末尾。因此,您第二次尝试阅读文件内容时会产生错误。你需要回到文件的开头。用途:

fr.seek(0) 

您的相反:

fr = open(filename) # Code2!!!!!!!!!!!!!!!!!!!!! 

你只需要readlines一次。

def file2matrix(filename): 
    fr = open(filename) 
    lines = fr.readlines()  
    fr.close()  
    numberOfLines = len(lines)  #get the number of lines in the file 
    returnMat = zeros((numberOfLines,3))  #prepare matrix to return 
    classLabelVector = []      #prepare labels return 
    index = 0 
    for line in lines: 
     line = line.strip() 
     listFromLine = line.split('\t') 
     returnMat[index,:] = listFromLine[0:3] 
     # careful here, returnMat is initialed as floats 
     # listFromLine is list of strings 
     classLabelVector.append(int(listFromLine[-1])) 
     index += 1 
    return returnMat,classLabelVector 

我可以建议一些其他的变化:

def file2matrix(filename): 
    with open(filename) as f: 
     lines = f.readlines() 
    returnList = [] 
    classLabelList = [] 
    for line in lines: 
     listFromLine = line.strip().split('\t') 
     returnList.append(listFromLine[0:3]) 
     classLabelList.append(int(listFromLine[-1])) 
    returnMat = np.array(returnList, dtype=float) 
    return returnMat, classLabelList 

甚至

def file2matrix(filename): 
    with open(filename) as f: 
     lines = f.readlines() 
    ll = [line.strip().split('\t')] 
    returnMat = np.array([l[0:3] for l in ll], dtype=float) 
    classLabelList = [int(l[-1]) for l in ll] 
    # classLabelVec = np.array([l[-1] for l in ll], dtype=int) 
    return returnMat, classLabelList