使用Tweepy检索Twitter数据
问题描述:
我正在使用Python代码使用Tweepy库来检索特定主题标签的Twitter数据,但问题是我需要检索特定时间段,例如2013年6月30日至2013年12月30日。我怎样才能做到这一点?使用Tweepy检索Twitter数据
#imports
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
#setting up the keys
consumer_key = '……………….'
consumer_secret = '……………..'
access_token = '……………….'
access_secret = '……………..'
class TweetListener(StreamListener):
# A listener handles tweets are the received from the stream.
#This is a basic listener that just prints received tweets to standard output
def on_data(self, data):
print (data)
return True
def on_error(self, status):
print (status)
#printing all the tweets to the standard output
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
stream = Stream(auth, TweetListener())
t = u"#سوريا"
stream.filter(track=[t])
答
我仍在调查为什么我不能得到使用tweepy.Cursor(api.search, geocode=.., q=query, until=date)
相同的结果也许是这个reason。但是我可以在两个日期之间使用Tweepy检索Twitter数据。
首先,我在开始日期和结束日期之间创建了一个日期生成器。
def date_range(start,end):
current = start
while (end - current).days >= 0:
yield current
current = current + datetime.timedelta(seconds=1) #Based on your need, but you could do it per day/minute/hour
然后,我创建了一个Listener
,所以我可以说是在特定的一天通过访问status.created_at
创建你的代码应该看起来像鸣叫:
import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json
import datetime
#Use your keys
consumer_key = '...'
consumer_secret = '...'
access_token = '...'
access_secret = '...'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
def date_range(start,end):
current = start
while (end - current).days >= 0:
yield current
current = current + datetime.timedelta(seconds=1)
class TweetListener(StreamListener):
def on_status(self, status):
#api = tweepy.API(auth_handler=auth)
#status.created_at += timedelta(hours=900)
startDate = datetime.datetime(2013, 06, 30)
stopDate = datetime.datetime(2013, 10, 30)
for date in date_range(startDate,stopDate):
status.created_at = date
print "tweet " + str(status.created_at) +"\n"
print status.text + "\n"
# You can dump your tweets into Json File, or load it to your database
stream = Stream(auth, TweetListener(), secure=True,)
t = u"#Syria" # You can use different hashtags
stream.filter(track=[t])
输出:
我只是打印日期来检查(我不希望垃圾邮件与政治tweet的StackOverflow)。
tweet 2013-06-30 00:00:01
-------------------
tweet 2013-06-30 00:00:02
-------------------
tweet 2013-06-30 00:00:03
-------------------
tweet 2013-06-30 00:00:04
-------------------
tweet 2013-06-30 00:00:05
-------------------
tweet 2013-06-30 00:00:06
-------------------
tweet 2013-06-30 00:00:07
-------------------
tweet 2013-06-30 00:00:08
-------------------
tweet 2013-06-30 00:00:09
-------------------
您无法获取该数据;见例如http://stackoverflow.com/a/1733360/3001761 – jonrsharpe 2014-11-02 16:07:44
但我连续运行两天的代码,检索数据。所有这些元数据只有三个星期? – Hana 2014-11-02 16:29:59
@Hana你能解决这个问题吗? – user3378649 2014-11-02 23:32:05