I am using Tweepy to capture streaming tweets based off of the hashtag #WorldCup, as seen by the code below. It works as expected.
class StdOutListener(StreamListener): ''' Handles data received from the stream. ''' def on_status(self, status): # Prints the text of the tweet print('Tweet text: ' + status.text) # There are many options in the status object, # hashtags can be very easily accessed. for hashtag in status.entries['hashtags']: print(hashtag['text']) return true def on_error(self, status_code): print('Got an error with status code: ' + str(status_code)) return True # To continue listening def on_timeout(self): print('Timeout...') return True # To continue listening if __name__ == '__main__': listener = StdOutListener() auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) stream = Stream(auth, listener) stream.filter(follow=, track=['#WorldCup'])
Because this is a hot hashtag right now, searches don't take too long to catch the maximum amount of tweets that Tweepy lets you get in one transaction. However, if I was going to search on #StackOverflow, it might be much slower, and therefore, I'd like a way to kill the stream. I could do this on several parameters, such as stopping after 100 tweets, stopping after 3 minutes, after a text output file has reached 150 lines, etc. I do know that the socket timeout time isn't used to achieve this.
I have taken a look at this similar question:
However, it appears to not use the streaming API. The data that it collects is also very messy, whereas this text output is clean.
Can anyone suggest a way to stop Tweepy (when using the stream in this method), based on some user input parameter, besides a keyboard interrupt?
I solved this, so I'm going to be one of those internet heroes that answers their own question.
This is achieved by using static Python variables for the counter and for the stop value (e.g. stop after you grab 20 tweets). This is currently a geolocation search, but you could easily swap it for a hashtag search by using the
#!/usr/bin/env python from tweepy import (Stream, OAuthHandler) from tweepy.streaming import StreamListener class Listener(StreamListener): tweet_counter = 0 # Static variable def login(self): CONSUMER_KEY = CONSUMER_SECRET = ACCESS_TOKEN = ACCESS_TOKEN_SECRET = auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET) auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET) return auth def on_status(self, status): Listener.tweet_counter += 1 print(str(Listener.tweet_counter) + '. Screen name = "%s" Tweet = "%s"' %(status.author.screen_name, status.text.replace('\n', ' '))) if Listener.tweet_counter < Listener.stop_at: return True else: print('Max num reached = ' + str(Listener.tweet_counter)) return False def getTweetsByGPS(self, stop_at_number, latitude_start, longitude_start, latitude_finish, longitude_finish): try: Listener.stop_at = stop_at_number # Create static variable auth = self.login() streaming_api = Stream(auth, Listener(), timeout=60) # Socket timeout value streaming_api.filter(follow=None, locations=[latitude_start, longitude_start, latitude_finish, longitude_finish]) except KeyboardInterrupt: print('Got keyboard interrupt') def getTweetsByHashtag(self, stop_at_number, hashtag): try: Listener.stopAt = stop_at_number auth = self.login() streaming_api = Stream(auth, Listener(), timeout=60) # Atlanta area. streaming_api.filter(track=[hashtag]) except KeyboardInterrupt: print('Got keyboard interrupt') listener = Listener() listener.getTweetsByGPS(20, -84.395198, 33.746876, -84.385585, 33.841601) # Atlanta area.