33462

Stopping Tweepy stream after a duration parameter (# lines, seconds, #Tweets, etc)

I am using Tweepy to capture streaming tweets based off of the hashtag #WorldCup, as seen by the code below. It works as expected.

class StdOutListener(StreamListener): ''' Handles data received from the stream. ''' def on_status(self, status): # Prints the text of the tweet print('Tweet text: ' + status.text) # There are many options in the status object, # hashtags can be very easily accessed. for hashtag in status.entries['hashtags']: print(hashtag['text']) return true def on_error(self, status_code): print('Got an error with status code: ' + str(status_code)) return True # To continue listening def on_timeout(self): print('Timeout...') return True # To continue listening if __name__ == '__main__': listener = StdOutListener() auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) stream = Stream(auth, listener) stream.filter(follow=[38744894], track=['#WorldCup'])

Because this is a hot hashtag right now, searches don't take too long to catch the maximum amount of tweets that Tweepy lets you get in one transaction. However, if I was going to search on #StackOverflow, it might be much slower, and therefore, I'd like a way to kill the stream. I could do this on several parameters, such as stopping after 100 tweets, stopping after 3 minutes, after a text output file has reached 150 lines, etc. I do know that the socket timeout time isn't used to achieve this.

I have taken a look at this similar question:

Tweepy Streaming - Stop collecting tweets at x amount

However, it appears to not use the streaming API. The data that it collects is also very messy, whereas this text output is clean.

Can anyone suggest a way to stop Tweepy (when using the stream in this method), based on some user input parameter, besides a keyboard interrupt?

Thanks

Answer1:

I solved this, so I'm going to be one of those internet heroes that answers their own question.

This is achieved by using static Python variables for the counter and for the stop value (e.g. stop after you grab 20 tweets). This is currently a geolocation search, but you could easily swap it for a hashtag search by using the getTweetsByHashtag() method.

#!/usr/bin/env python from tweepy import (Stream, OAuthHandler) from tweepy.streaming import StreamListener class Listener(StreamListener): tweet_counter = 0 # Static variable def login(self): CONSUMER_KEY = CONSUMER_SECRET = ACCESS_TOKEN = ACCESS_TOKEN_SECRET = auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET) auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET) return auth def on_status(self, status): Listener.tweet_counter += 1 print(str(Listener.tweet_counter) + '. Screen name = "%s" Tweet = "%s"' %(status.author.screen_name, status.text.replace('\n', ' '))) if Listener.tweet_counter < Listener.stop_at: return True else: print('Max num reached = ' + str(Listener.tweet_counter)) return False def getTweetsByGPS(self, stop_at_number, latitude_start, longitude_start, latitude_finish, longitude_finish): try: Listener.stop_at = stop_at_number # Create static variable auth = self.login() streaming_api = Stream(auth, Listener(), timeout=60) # Socket timeout value streaming_api.filter(follow=None, locations=[latitude_start, longitude_start, latitude_finish, longitude_finish]) except KeyboardInterrupt: print('Got keyboard interrupt') def getTweetsByHashtag(self, stop_at_number, hashtag): try: Listener.stopAt = stop_at_number auth = self.login() streaming_api = Stream(auth, Listener(), timeout=60) # Atlanta area. streaming_api.filter(track=[hashtag]) except KeyboardInterrupt: print('Got keyboard interrupt') listener = Listener() listener.getTweetsByGPS(20, -84.395198, 33.746876, -84.385585, 33.841601) # Atlanta area.

Recommend

  • Using Tweepy API behind proxy
  • Verification of signature failed Oauth 1 Upwork API
  • Retrieve values from the reponse xml by GetGetElementByTheTagName
  • How to create instance of Twitter::Tweet to create retweeted_by_user
  • How to collect tweets about an event that are posted on specific date using python?
  • UnicodeEncodeError: 'cp949' codec can't encode character
  • oAuth in PHP to make 2 legged request
  • How to send direct messages using OAuth?
  • Twitter::Error::Forbidden - Unable to verify your credentials
  • Add fields to Logstash Twitter input and Elasticsearch output
  • How should I use Consumer object in oauth2 v.2 in Python 3.4?
  • Tweepy SSLError regarding ssl certificate
  • Retrieve blog feeds using google oauth 2.0 and scribe
  • How to switch from Twitter API single account use to multiaccount use, keeping it still a private ap
  • Posting multiple photos in a single tweet
  • tkinter.TclError: character U+1f449 is above the range (U+0000-U+FFFF) allowed by Tcl
  • python mocking third party modules
  • Where to exclude retweets in this tweepy script?
  • “TypeError: a float is required” occurred when using urllib2
  • How to filter input data of logstash based on date filed?
  • Access Etsy API oauth using c# RestSharp
  • Tweepy: simple script with 'Bad Authentication data' error
  • Stopping Tweepy stream after a duration parameter (# lines, seconds, #Tweets, etc)
  • How do I move twitter configuration out of the controller? (Rails)
  • PHP SoapClient __getFunctions() returning UNKNOWN types
  • How to set up the Twitter gem in rails app?
  • Return list of all users in BOX Enterprise Account
  • Failed to create any app request
  • Uber API - requests endpoint cannot read read json
  • Circular dependency while pushing http interceptor
  • File not found error Google Drive API