通过外部表合并子目录中的文件

问题描述:

我有一个文件夹(split_libs),其子文件夹根据SraRunTable3.txt第9和32列中描述的sample_name命名,每个文件夹都与sra_study关联。每个子文件夹内都有一个seqs.fna文件,对此,我不能更改名称 - 这是QIIME命令的输出。通过外部表合并子目录中的文件

我想通过阅读子文件夹名称(= sample_name)根据sra_study在子文件夹内合并seqs.fna文件。例如所有来自同一SRA研究的seqs.fna将被合并。

目录的一个例子概述:

split_libs 
    sample1 
     seqs.fna 
    sample2 
     seqs.fna 
    sample3 
     seqs.fna 

的SraRunTable的例子概述:

(...)Sample_Name(...)SRA_Study(...) 
    sample_1  study_1 
    sample_2  study_1 
    sample_3  study_2 

这里是我试过到目前为止:

import os 
from operator import itemgetter 

fields = itemgetter(9, 32) 

with open('/home/andre/Desktop/PRJEB0000/SraRunTable3.txt') as csvfile: 
next(csvfile) 
for line in csvfile: 
    sample_name, sra_study = fields(line.split()) 
for folder in os.listdir('./split_libs'): 
    if folder == sample_name: 
     open('seqs.fna') as infile, open('/home/andre/Desktop/PRJEB0000/cat_fna/' + sra_study + ".fna", 'a') as outfile: 
      outfile.write(infile.read()) 

这个问题脱掉的Joining files by corresponding columns in outside table

任何捐款将不胜感激!

+0

@mhawke,就像我们谈过的,这里是改进的重新发布! –

import os 
from operator import itemgetter 

fields = itemgetter(9, 32) 

with open('/home/andre/Desktop/PRJEB0000/SraRunTable3.txt') as csvfile: 
next(csvfile) 
for line in csvfile: 
    sample_name, sra_study = fields(line.split()) 
    #open the folder corresponding to sample_name and add the seqs to the appropriate study file 
    with open('split_libs/'+sample_name+'/seqs.fna') as infile, open('/home/andre/Desktop/PRJEB0000/cat_fna/' + sra_study + ".fna", 'a') as outfile: 
      outfile.write(infile.read()) 

所有学分Amanda Clare(未在Stackoverflow上注册)!