使用数字列表的Python子集序列

问题描述：

我正在尝试编写一个程序，该程序需要一个包含数字列表的文件，并将这些数字中的每一个用于子集部分字符串。当我尝试调用我的函数（下）我得到的错误：使用数字列表的Python子集序列

TypeError: unsupported operand type(s) for -: 'str' and 'int'

我试图改变在for循环i到int(i)的情况下，出于某种原因，i不是一个整数，而是导致以下错误：

ValueError: invalid literal for int() with base 10: ''

代码：

#Function Collects Sequences and Writes to a Files 
def gen_insertion_seq(index, seq, gene): 
    output = open("%s_insertion_seq.txt" % gene, 'w') 
    indices = index.read() 
    sequence = seq.read() 
    for i in indices: 
     site = sequence[i-9:i+15] 
     output.write(site + '\n') 

#Open Index Files 
shaker_index = open("212_index.txt") 
kir2_index = open("214_index.txt") 
asic1a_index = open("216_index.txt") 
nachra7_index = open("252_index.txt") 

#Open Sequence Files 
shaker_seq = open("212_seq.txt") 
kir2_seq = open("214_seq.txt") 
asic1a_seq = open("216_seq.txt") 
nachra7_seq = open("252_seq.txt") 
#Call function on Index and Sequence Files - Should output list of generated Sequences for insertion sites. 
#Must hand check first couple 
gen_insertion_seq(shaker_index, shaker_seq, 'shaker')

样品输入文件：

212_index.txt

212_seq.txt

ATGGCCGCCGTGGCACTGCGAGAACAACAGCTCCAACGAAATAGTCTGGATGGATACGGTTCACTGCCTAAACTGTCTAGCCAAGACGAAGAAGGTGGCGCCGGCCATGGCTTCGGTGGGGGC

你知道'read'确实会返回吗？ –

我想不是在添加之前，如果我在我的函数中调用它们，我无法获得任何要打印的文本文件的内容。它不会返回变量的内容吗？ – Willow

你能证明吗？ –

答

的错误，你的代码是由read没有达到你所期望的。 Called without parameters，它将整个文件读入一个字符串。然后迭代字符串中的字符而不是文件中的数字。当您在序列的索引中执行'1' - 9时，会发生TypeError。

您将直接将迭代值转换为int的直觉基本上是正确的。但是，由于您仍然遍历字符，因此您可以获得int('1'),int('3'),int('1'),int('2')，然后从int('\n')开始ValueError。 read按原样读取整个文件，换行符和全部文件。

幸运的是，file object是iterable整个文件中的行。这意味着您可以执行诸如for line in file: ...之类的操作，并且line将采用要分析的每个索引的字符串值。它还有额外的好处，就是可以将行结尾从行中删除，这意味着您可以直接将它传递到int，而无需进一步修改。

您可以对代码进行多项其他改进，包括可以使其正常工作的更正。

按@ Acccumulation的建议，在with块打开的文件，以确保它们得到清理正确，如果程序崩溃，例如来自I/O错误。它也会在块结束时自动关闭文件，这是你目前根本没有做的事情（但应该是）
从概念上讲，根本不需要传递文件对象。你只能在一个地方使用每一个用于一个目的。我甚至会扩展这个，建议您编写一个小函数来将每个文件类型解析为可用格式，然后传递。
文件可以在Python中按行迭代。这对于索引文件来说非常方便，它是一种非常面向行的格式。您根本不需要完整地完成read，并且可以从@ MaximTitarenko的评论中节省几个步骤。
您可以直接在文件上使用str.join来合并其中有换行符的任何序列。

结合所有的建议，你可以做到以下几点：

def read_indices(fname): 
    with open(fname, 'r') as file: 
     return [int(index) for index in file] 

def read_sequence(fname): 
    with open(fname, 'r') as file: 
     return ''.join(file)

由于文件是字符串的iterables，您可以在list comprehensions使用它们串加入这样的操作。你的代码的其余部分现在看起来干净多了：现在

 
def gen_insertion_seq(index, seq, gene): 
    indices = read_indices(index) 
    sequence = read_sequence(seq) 
    with open("%s_insertion_seq.txt" % gene, 'w') as output: 
     for i in indices: 
      site = sequence[i-9:i+15] 
      output.write(site + '\n') 

gen_insertion_seq('212_index.txt', '212_seq.txt', 'shaker') 
gen_insertion_seq('214_index.txt', '214_seq.txt', 'kir2') 
gen_insertion_seq('216_index.txt', '216_seq.txt', 'asic1a') 
gen_insertion_seq('252_index.txt', '252_seq.txt', 'nachra7')

你的主要功能是比较容易理解，因为它仅仅关注序列，而不是之类的东西I/O和解析。你也没有一堆打开的文件句柄，等待一个错误。事实上，文件操作都是独立的，远离真正的任务。

如果你有文件ID和基因名称序列（in the Python sense），你可以进一步简化了循环调用你的函数：

for id, gene in zip((212, 214, 216, 252), ('shaker', 'kir2', 'asic1a', 'nachra7')): 
    gen_insertion_seq('%d_index.txt' % id, '%d_seq.txt' % id, gene)

PS。 Python教程中的I/O部分非常好。关于files的部分可能是您特别感兴趣的部分。

回答太快时发生的情况，谢谢你的详细回复！我将通读python教程并改进我的代码。再次，真的很感激它。 – Willow

@Willow。如果这个答案对你有帮助，我会建议选择它。它会标记你的问题为答案，并给你一些rep点（也是我:)。 –

太棒了！对不起，我对所有这些（显然;）） – Willow

答

尝试输入'shaker'用双引号，"shaker"。或者，在你的函数中使用str（基因）。

OK，我才意识到这是蟒蛇所以报价的事情不应该的问题，我觉得

或者open("{}_insertion_seq.txt".format(gene), 'w')

如果这是在写，在改变output.write(site + '\n') 到output.write(str(site) + '\n')

你正在看错了地方。不支持的操作数类型发生在依赖于他的文件内容的变量“i”上。 – anupsabraham

@anupsabraham riiight好的。编辑 –

@anupsabraham这就是当你从loo –

使用数字列表的Python子集序列

相关推荐