如何在Python中逐个读取文件中的令牌？

问题描述：

我遇到的问题是，在我的代码中，我无法获取单个词/标记以匹配停用词从原始文本中删除。相反，我得到了一整句话，因此无法将它与停用词相匹配。请告诉我一种方法，我可以获取个人令牌，然后用停用词匹配并删除它们。请帮帮我。如何在Python中逐个读取文件中的令牌？

from nltk.corpus import stopwords 
import string, os 
def remove_stopwords(ifile): 
    processed_word_list = [] 
    stopword = stopwords.words("urdu") 
    text = open(ifile, 'r').readlines() 
    for word in text: 
     print(word) 
     if word not in stopword: 
       processed_word_list.append('*') 
       print(processed_word_list) 
       return processed_word_list 

if __name__ == "__main__": 
    print ("Input file path: ") 
    ifile = input() 
    remove_stopwords(ifile)

你没有得到文本的话的原因是因为你使用'readlines方法（）'函数。这给你一个文件中的行/句子的迭代，然后当你说'文本中的单词'时，你会逐一获取这些行。 –

答

试试这个：

from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize 
import string, os, ast 
def remove_stopwords(ifile): 
    processed_word_list = [] 
    stopword = stopwords.words("urdu") 
    words = ast.literal_eval(open(ifile, 'r').read()) 
    for word in words: 
     print(word) 
     if word not in stopword: 
      processed_word_list.append('*') 
     else: 
      processed_word_list.append(word) 
    print(processed_word_list) 
    return processed_word_list 

if __name__ == "__main__": 
    print ("Input file path: ") 
    ifile = input() 
    remove_stopwords(ifile)

这不会起作用，因为'line'是一个字符串，因此您将遍历'line'中的字符。尽管'line.split（）'交换'line'，我们很高兴去。 –

这段代码在它终止后只给我第一个单词。我无法获得整个列表，而只是获取文件中的第一个单词。我希望它迭代并将给定文本文件中的所有单词匹配到停用词，并向列表中显示没有停用词或停用词的列表。 – user3778289

也.split（）函数令牌，而我提供的文件已被标记。 – user3778289

如何在Python中逐个读取文件中的令牌？

相关推荐