如何通过比较两个文件中的字符串比较两个文件来正确循环

问题描述：

我无法对单词列表（文件2，制表符分隔符，两个字符）进行tweets（文件1，标准twitter json响应）的情感分析专栏），分配给他们的观点（正面或负面）。如何通过比较两个文件中的字符串比较两个文件来正确循环

问题是：顶部循环只运行一次，然后脚本结束，而我循环通过文件1，然后嵌套在那里我循环通过文件2，并试图比较并保持运行总和的情绪为每个推文。

，所以我有：

def get_sentiments(tweet_file, sentiment_file): 


    sent_score = 0 
    for line in tweet_file: 

     document = json.loads(line) 
     tweets = document.get('text') 

     if tweets != None: 
      tweet = str(tweets.encode('utf-8')) 

      #print tweet 


      for z in sentiment_file: 
       line = z.split('\t') 
       word = line[0].strip() 
       score = int(line[1].rstrip('\n').strip()) 

       #print score 



       if word in tweet: 
        print "+++++++++++++++++++++++++++++++++++++++" 
        print word, tweet 
        sent_score += score 



      print "====", sent_score, "=====" 

    #PROBLEM, IT'S ONLY DOING THIS FOR THE FIRST TWEET 

file1 = open(tweetsfile.txt) 
file2 = open(sentimentfile.txt) 


get_sentiments(file1, file2)

我花了好半天试图弄清楚为什么它打印出没有嵌套for循环file2的所有微博，但有了它，只有它处理第一条推文然后退出。

答

它只做一次的原因是for循环已经到达文件末尾，所以它停止了，因为没有更多的行要读取。

换句话说，第一次循环运行时，它遍历整个文件，然后由于没有更多的行要读取（自从它到达文件末尾），它不会再循环，导致只有一行正在处理。

所以解决此问题的一种方法是“倒回”该文件，您可以使用文件对象的seek方法执行该操作。

如果您的文件不是很大，另一种方法是将它们全部读入列表或类似结构中，然后循环遍历它。

然而，由于你的景气指数是一个简单的查找，最好的办法是建立一个字典的景气指数，然后查找字典中的每个字计算鸣叫的整体人气：

import csv 
import json 

scores = {} # empty dictionary to store scores for each word 

with open('sentimentfile.txt') as f: 
    reader = csv.reader(f, delimiter='\t') 
    for row in reader: 
     scores[row[0].strip()] = int(row[1].strip()) 


with open('tweetsfile.txt') as f: 
    for line in f: 
     tweet = json.loads(line) 
     text = tweet.get('text','').encode('utf-8') 
     if text: 
      total_sentiment = sum(scores.get(word,0) for word in text.split()) 
      print("{}: {}".format(text,score))

with statement自动关闭文件处理程序。我正在使用csv module来读取文件（它也适用于制表符分隔的文件）。

这行不计算：

total_sentiment = sum(scores.get(word,0) for word in text.split())

它是写此循环更短的方式：

tweet_score = [] 
for word in text.split(): 
    if word in scores: 
     tweet_score[word] = scores[word] 

total_score = sum(tweet_score)

字典的get方法需要一秒钟可选参数时返回自定义值钥匙找不到;如果你省略第二个参数，它将返回None。在我的循环中，我使用它来返回0，如果这个词没有得分。

我不认为这可能有更好的答案。谢谢。 – roy 2013-05-06 13:42:31

如何通过比较两个文件中的字符串比较两个文件来正确循环

相关推荐