如何使用列表生成器来处理单词及其出现次数？

问题描述：

a aba aaa 
dd ddd dd

我出shold包含：

[[a,1],[dd,2],[aba ,1],[ddd,1],[aaa,1]]

却是：

[[a,1],[dd,2],[aba ,1],[dd,2],[ddd,1],[aaa,1]]

这里是全码：

在1.txt的

import re 

def get_words_from_string(s): 
    return (re.findall(re.compile('\w+'), s.lower())) 


def merge(seq): 
    merged = [] 
    for s in seq: 
     for x in s: 
      merged.append(x) 
    return merged 


fp1 = open('1.txt' , 'r'); 

set1 = set(line.strip() for line in fp1); 

l1 =[] 
for x in set1: 
    x.split() 
    x = get_words_from_string(x) 
    l1.append(x) 

l1= merge(l1); 

out = [] 
out = [[word , l1.count(word)] for word in l1 if (1 > out.count(word))]

问题是它抛出异常，如果该单词不在第一次出现有没有一种安全的方法来查找列表是否包含一个项目？

请试着弄清楚为什么你认为它不起作用，然后修改你的问题，包括你的发现。 – 2011-12-17 02:58:40

当你运行这段代码时会发生什么，会发生什么？ – Blender 2011-12-17 03:00:34

答

而且没有进口的解决办法是：现在

>>> f = open('1.txt', 'r') 
>>> words = f.read().split() 
>>> word_counter = {} 
>>> for word in words: 
... word_counter[word] = word_counter.get(word, 0) + 1 
... 
>>> word_counter 
{'a': 1, 'aba': 1, 'dd': 2, 'aaa': 1, 'ddd': 1}

word_counter是dict用的所有单词的频率的频道。如果你想把它作为列表的列表，你可以使用列表理解：

>>> word_counter_as_list = [ [k, v] for k, v in word_counter.items() ] 
>>> word_counter_as_list 
[['a', 1], ['aba', 1], ['dd', 2], ['aaa', 1], ['ddd', 1]]

答

此行

out = [[word , l1.count(word)] for word in l1 if (not(-1<l1.index(word)))]

说：“创建单词列表/数列出了每个字，使得-1不小于字的索引l1”。但是-1总是小于l1中单词的索引，因为索引总是正数。所以这个过滤掉了所有的结果。

如果您删除not，则按预期工作。但是，然后过滤器是完全没有意义的。 index的结果总是大于-1，所以没有任何过滤器。也就是说，除非word根本不在l1中，否则会引发异常！

更多关注你的代码，你已经创建了一个荒谬的过于复杂的程序。有一个3线程序可以做你想做的事。你为什么要创建set行，然后迭代它们？你为什么使用正则表达式？这是一个非常简单的问题，只是向您展示最佳方式，我感觉不对。但这里有一些提示：

>>> fp1 = open('1.txt' , 'r'); 
>>> s = fp1.read() 
>>> s 
'a aba aaa\ndd ddd dd\n' 
>>> s.split() 
['a', 'aba', 'aaa', 'dd', 'ddd', 'dd'] 
>>> set(s.split()) 
set(['a', 'aba', 'dd', 'aaa', 'ddd'])

答

from collections import Counter 

with open("1.txt") as f: 
    words = f.read().split() 

c = Counter(words) 

print [[word,count] for word, count in c.iteritems()]

答

fp1 = open('1.txt' , 'r'); 
l1 = fp1.read(); 
    set1 = set(l1.split()); 
    for it in set1 : 
     print it, "count = " , l1.count(it);

如何使用列表生成器来处理单词及其出现次数？

相关推荐