的Python：列表的交集/套

问题描述：

def boolean_search_and(self, text):的Python：列表的交集/套

 results = [] 
    and_tokens = self.tokenize(text) 
    tokencount = len(and_tokens) 

    term1 = and_tokens[0] 
    print ' term 1:', term1 

    term2 = and_tokens[1] 
    print ' term 2:', term2 

    #for term in and_tokens: 
    if term1 in self._inverted_index.keys(): 
     resultlist1 = self._inverted_index[term1] 
     print resultlist1 
    if term2 in self._inverted_index.keys(): 
     resultlist2 = self._inverted_index[term2] 
     print resultlist2 
    #intersection of two sets casted into a list     
    results = list(set(resultlist1) & set(resultlist2)) 
    print 'results:', results 

    return str(results)

此代码的伟大工程两个标记，例如：文本= “Hello World” 的，因此，代币= [ '你好'， '世界'。我想将它推广为多个标记，所以文本可以是一个句子，也可以是一个整个文本文件。
self._inverted_index是一个将令牌保存为键的字典，其值是发生键/令牌的DocID。

你好 - > [1,2,5,6]
世界 - > [1,3,5,7,8]
结果：
hello和world - >并[1,5]

我想达到的结果：说，（（（您好，计算机）和科学）和世界）

我就使这项工作多个单词，而不仅仅是两个工作。我开始在python这个早期工作，所以我不知道它提供了很多功能。

任何想法？

答

我想概括它多令牌

def boolean_search_and_multi(self, text): 
    and_tokens = self.tokenize(text) 
    results = set(self._inverted_index[and_tokens[0]]) 
    for tok in and_tokens[1:]: 
     results.intersection_update(self._inverted_index[tok]) 
    return list(results)

谢谢Python的大师！ – csguy11 2010-09-13 04:34:39

@csguy，不客气！ - ） – 2010-09-13 05:05:01

答

内置的set型号适合您吗？

$ python 
Python 2.6.5 (r265:79063, Jun 12 2010, 17:07:01) 
[GCC 4.3.4 20090804 (release) 1] on cygwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> hello = set([1,2,5,6]) 
>>> world = set([1,3,5,7,8]) 
>>> hello & world 
set([1, 5])

的Python：列表的交集/套

相关推荐