在Python中的文本文件中计算单词的频率
问题描述:
我想弄清楚如何制作一个程序,该文件需要用户选择的文件(通过输入文件名)并计算每个单词的频率投入。在Python中的文本文件中计算单词的频率
我有大部分,但是当我在多个词的程序,找出输入,仅第一字显示正确的频率,其余显示为“0次出现”
file_name = input("What file would you like to open? ")
f = open(file_name, "r")
the_full_text = f.read()
words = the_full_text.split()
search_word = input("What words do you want to find? ").split(",")
len_list = len(search_word)
word_number = 0
print()
print ('... analyzing ... hold on ...')
print()
print ('Frequency of word usage within', file_name+":")
for i in range(len_list):
frequency = 0
for word in words:
word = word.strip(",.")
if search_word[word_number].lower() == word.lower():
frequency += 1
print (" ",format(search_word[word_number].strip(),'<20s'),"/", frequency, "occurrences")
word_number = word_number + 1
等的例子输出将是:
What file would you like to open? assignment_8.txt
What words do you want to find? wey, rights, dem
... analyzing ... hold on ...
Frequency of word usage within assignment_8.txt:
wey /96 occurrences
rights /0 occurrences
dem /0 occurrences
我的程序出了什么问题?请帮忙:o
答
您需要去掉搜索词中的空格。
但是,您当前的算法效率非常低,因为它必须重新扫描每个搜索词的整个文本。这是一个更有效的方法。首先,我们清理搜索词并将其放入列表中。然后,我们在该列表中建立一个字典,以便在文本文件中找到它们时存储每个这些字词的计数。
file_name = input("What file would you like to open? ")
with open(file_name, "r") as f:
words = f.read().split()
search_words = input("What words do you want to find? ").split(',')
search_words = [word.strip().lower() for word in search_words]
#print(search_words)
search_counts = dict.fromkeys(search_words, 0)
print ('\n... analyzing ... hold on ...')
for word in words:
word = word.rstrip(",.").lower()
if word in search_counts:
search_counts[word] += 1
print ('\nFrequency of word usage within', file_name + ":")
for word in search_words:
print(" {:<20s}/{} occurrences".format(word, search_counts[word]))
如果你在''分裂,'',你的输入不应该是''wey,rights,dem'',没有空白吗? –