使用python从多个文件中提取数据

问题描述：

我试图从12个.txt文件的目录中提取数据。每个文件包含我想要提取的3列数据（X，Y，Z）。我想收集一个df（InforDF）中的所有数据，但到目前为止，我只能成功创建一个df，其中包含同一列中的所有X，Y和Z数据。这是我的代码：使用python从多个文件中提取数据

import pandas as pd 
import numpy as np 
import os 
import fnmatch 

path = os.getcwd() 

file_list = os.listdir(path) 

InfoDF = pd.DataFrame() 

for file in file_list: 
    try: 
     if fnmatch.fnmatch(file, '*.txt'): 
      filedata = open(file, 'r') 
      df = pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'}) 

    except Exception as e: 
     print(e)

我在做什么错？

作为的话，不抓，一般例外（总是抓特殊类型的例外） –

您在每次迭代时覆盖df –

答

我认为你需要glob的选择所有文件，在list comprehension创建DataFramesdfs列表，然后使用concat：

files = glob.glob('*.txt') 
dfs = [pd.read_csv(fp, delim_whitespace=True, names=['X','Y','Z']) for fp in files] 

df = pd.concat(dfs, ignore_index=True)

答

df = pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'})

这条线在每次循环替换df，这就是为什么你在程序结束时只有最后一个。作为金瑞利上面提到

df_list = pd.concat([pd.read_table(open(file, 'r'), delim_whitespace=True, names={'X','Y','Z'}) for file in file_list if fnmatch.fnmatch(file, '*.txt')])

答

：

你可以做的是保存所有的数据帧中的一个列表，并在年底将它们连接起来

df_list = [] 
for file in file_list: 
    try: 
     if fnmatch.fnmatch(file, '*.txt'): 
      filedata = open(file, 'r') 
      df_list.append(pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'})) 
df = pd.concat(df_list)

或者，你可以把它写，您在您的循环中覆盖了df

此外，没有意见可发现一般例外

解决方案：循环之前创建一个空的数据框中InfoDF然后用append或concat与填充它更小的df小号

import pandas as pd 
import numpy as np 
import os 
import fnmatch 

path = os.getcwd() 

file_list = os.listdir(path) 

InfoDF = pd.DataFrame(columns={'X','Y','Z'}) # create empty dataframe 
for file in file_list: 
    if fnmatch.fnmatch(file, '*.txt'): 
     filedata = open(file, 'r') 
     df = pd.read_table(filedata, delim_whitespace=True, names={'X','Y','Z'}) 
     InfoDF.append(df, ignore_index=True) 
print InfoDF

使用python从多个文件中提取数据

相关推荐