我的代码：

data = tweepy.Cursor(api.search, q, since=a[i], until=b[i]).items() 
    tweet_data = [] 
    tweets = pd.DataFrame() 
    tweets['Tweet_ID'] = map(lambda tweet: tweet['id'], tweet_data) 
    tweets['Tweet'] = map(lambda tweet: tweet['text'].encode('utf-8'), tweet_data) 
    tweets['Date'] = map(lambda tweet: time.strftime('%Y-%m-%d %H:%M:%S', time.strptime(tweet['created_at'],'%a %b %d %H:%M:%S +0000 %Y')), tweet_data) 
    tweets['User'] = map(lambda tweet: tweet['user']['screen_name'], tweet_data) 
    tweets['Follower_count'] = map(lambda tweet: tweet['user']['followers_count'], tweet_data) 
    tweets['Hashtags']=map(lambda tweet: tweet['entities']['hashtags'], tweet_data)

电流输出：

df=pd.DataFrame({'Hashtags' : [{u'indices': [53, 65], u'text': u'Predictions'}, {u'indices': [67, 76], u'text': u'FreeTips'}, {u'indices': [78, 89], u'text': u'SoccerTips'}, {u'indices': [90, 103], u'text': u'FootballTips'}, {u'indices': [104, 110], u'text': u'Goals'}]})

预期输出：

df=pd.DataFrame({'Hashtags' :["u'Predictions'", "u'SoccerTips'", "u'FootballTips'", "u'Goals'"]})

我试图使用多种方法来拉平/缩小/访问嵌套包含字典列表的字典。请帮忙。

错误：

作为@ MSeifert建议，我试过他的方法。生成以下错误：

dt=tweet.entities.hashtags 
pd.io.json.json_normalize(dt, 'hashtags') 
pd.io.json.json_normalize(dt, 'hashtags')['text'].tolist() 

Traceback (most recent call last): <\br> 

File "<ipython-input-166-be11241611d6>", line 1, in <module> 
dt=tweet.entities.hashtags 

AttributeError: 'dict' object has no attribute 'entities'

我也试着这样做： -

dx = tweets['Hashtags'] 
for key, value in dx.items(): 
    print key, value

，出现以下错误： -

Traceback (most recent call last): 

File "<ipython-input-167-d66c278ec072>", line 2, in <module> 
    for key, value in dx.items(): 

File "C:\ANACONDA\lib\site-packages\pandas\core\generic.py", line 2740, in __getattr__ 
    return object.__getattribute__(self, name) 

AttributeError: 'Series' object has no attribute 'items'

UPDATE：

我能够访问嵌套话题标签词典的文本部分

tweets['Hashtags'][1][1]['text'] 
Out[209]: u'INDvPAK'

我想创建一个循环来追加行中的所有hashtags。

答

这里的解决方案：

后故障排除并尝试了很多时间的各种方法，我终于想出了如何拆分嵌套字典。这是一个相当简单的循环。我注意到，我们可以通过

tweets['Hashtags'][1][1]['text'] 
Out[209]: u'INDvPAK'

这是一个有价值的见解，我才知道我不需要提u'text作为我的索引来访问的主题标签的文本。将使用text。

代码：

ht=[] 
for s in range(len(tweets['Hashtags'])): 
    hasht=[] 
    for t in range(len(tweets.Hashtags[s])): 
     #zx = tweets['Hashtags'][s][t]['text'] 
     hasht.append(tweets['Hashtags'][s][t]['text']) 
     t=t+1 
    ht.append(hasht) 
    s=s+1 
tweets['HT']=zip(ht)

这是一个简单的嵌套的for循环，其通过在{ "Indices" : [], "u'text'" : []}然后遍历字典列表下

第一内键值迭代最后，我使用zip()到压缩单行/用户的主题标签列表。

OUTPUT：

([u'SoccerTips', u'FootballTips'],)

这可以很容易分裂。

答

除了使用DataFrame构造的，你可以使用json_normalize功能：

>>> import pandas as pd 
>>> d = {'Hashtags' : 
...  [{u'indices': [53, 65], u'text': u'Predictions'}, 
...  {u'indices': [67, 76], u'text': u'FreeTips'}, 
...  {u'indices': [78, 89], u'text': u'SoccerTips'}, 
...  {u'indices': [90, 103], u'text': u'FootballTips'}, 
...  {u'indices': [104, 110], u'text': u'Goals'}]} 
>>> pd.io.json.json_normalize(d, 'Hashtags') 
     indices   text 
0 [53, 65] Predictions 
1 [67, 76]  FreeTips 
2 [78, 89] SoccerTips 
3 [90, 103] FootballTips 
4 [104, 110]   Goals

然后，你可以只使用'text'柱：

>>> pd.io.json.json_normalize(d, 'Hashtags')['text'].tolist() 
[u'Predictions', u'FreeTips', u'SoccerTips', u'FootballTips', u'Goals']

嗨@MSeifert，它给了我以下几个小时后我一直试图解决的错误。请原谅我，因为我对python和编程完全陌生。 'dt = tweet.entities.hashtags --- pd.io.json.json_normalize（dt，'hashtags'） --- pd.io.json.json_normalize（dt，'hashtags'）['text'] .tolist（） ---回溯（最近最后调用）：文件 “”，第1行，在 DT = tweet.entities.hashtags AttributeError的：'字典'对象没有属性'实体'' – lightyagami96

你可以在你的问题中添加（具有正确的缩进）吗？很难在没有缩进的情况下查看这些代码片段。 :) – MSeifert

只需编辑您的问题:) – MSeifert

如何仅使用tweepy提取哈希标签中的文本？