如何使用Python提取两个字符串之间的文本？

问题描述：

我想用Python从temp.txt文件中提取头文件定义的文本块。如何使用Python提取两个字符串之间的文本？

TEMP.TXT是如下，其中头1（年）和标题2（月）被分隔符分隔“标签=/T”：

header1="2016"/theader2="Jan" 
Lion Animal 
Apple Food 
.end 

header1="2016"/theader2="Feb" 
Tiger Animal 
Orange Food 
.end

我写了一个脚本如下效果很好（CMD ：python script.py [year] with argvs），但是这允许我仅提取指定的（月份，年份）数据，并且限制通配符月份（或年份）来提取所有文本。（例如，如果我尝试使用python script.py [year] *通配符月份，它将不起作用。）有更好的方法吗？

import pandas as pd 
import re 
import sys 

year = sys.argv[1] 
month =sys.argv[2] 

with open('./temp.txt') as infile, open('./output', 'w') as outfile: 
    copy = False 
    for line in infile: 
     if line.strip() == 'header1="%s"\theader2="%s"' % (year,month): 
      copy = True 
     elif line.strip() == '.end': 
      copy = False 
     elif copy: 
      outfile.write(line) 

pd.read_csv('./output', encoding='utf8', sep='\;', dtype='unicode').to_excel('./output.xlsx', sheet_name='sheet2', index=False)

答

你可以添加通配符脚本：

if ((year == '*' or ('header1="%s"' % year) in line.strip()) and 
    (month == '*' or ('header2="%s"' % month) in line.strip()) 
    ): 
    copy = True

你需要逃跑或引用庆典时调用，这样它不会扩展到文件列表，为星号例如：

python script.py [year] \* 
python script.py [year] '*'

程序的总体形状是正确的，虽然在最低限度，你需要：

迭代通过行无论你在一个匹配块
跟踪或在需要时不
写入outfile中

你的脚本几乎做到了这一点，所以我不会”不用太担心优化它。

谢谢，这真的有助于解决问题！ – cinemania

如何使用Python提取两个字符串之间的文本？

相关推荐