17426

How to call Twitter's Streaming/Filter Feed with urllib2/httplib?

Question:

Update:

I switched this back from answered as I tried the solution posed in cogent Nick's answer and switched to Google's urlfetch:

logging.debug("starting urlfetch for http://%s%s" % (self.host, self.url)) result = urlfetch.fetch("http://%s%s" % (self.host, self.url), payload=self.body, method="POST", headers=self.headers, allow_truncated=True, deadline=5) logging.debug("finished urlfetch")

but unfortunately finished urlfetch is never printed - I see the timeout happen in the logs (it returns 200 after 5 seconds), but execution doesn't seem tor return.

<hr />

Hi All-

I'm attempting to play around with Twitter's <a href="http://apiwiki.twitter.com/Streaming-API-Documentation#Introduction" rel="nofollow">Streaming (aka firehose) API</a> with Google App Engine (I'm aware this probably isn't a great long term play as you can't keep the connection perpetually open with GAE), but so far I haven't had any luck getting my program to actually parse the results returned by Twitter.

Some code:

logging.debug("firing up urllib2") req = urllib2.Request(url="http://%s%s" % (self.host, self.url), data=self.body, headers=self.headers) logging.debug("called urlopen for %s %s, about to call urlopen" % (self.host, self.url)) fobj = urllib2.urlopen(req) logging.debug("called urlopen")

When this executes, unfortunately, my debug output never shows the called urlopen line printed. I suspect what's happening is that Twitter keeps the connection open and urllib2 doesn't return because the server doesn't terminate the connection.

Wireshark shows the request being sent properly and a response returned with results.

I tried adding Connection: close to my request header, but that didn't yield a successful result.

Any ideas on how to get this to work?

Answer1:

urllib on App Engine is a thin wrapper around the <a href="http://code.google.com/appengine/docs/python/urlfetch/" rel="nofollow">urlfetch API</a>. You're right about what's happening: Twitter's streaming API never terminates its response, so it times out, and urlfetch throws an exception.

If you use urlfetch directly, you can set the timeout (up to 10 seconds), and set allow_truncated to True so you can get the partial result. The Twitter streaming API really isn't a good match for App Engine, though, because App Engine requests are limited to 30 seconds of execution time, and urlfetch requests can't send back results progressively, or take more than 10 seconds. Using Twitter's 'standard' API would be a better option.

Recommend

  • How to specify IP country on TOR (windows)?
  • Stem as python tor client - stuck on loading descriptors
  • How to convert a single column to a matrix in R
  • Beautiful Soup Can't Find the First Tag (XML)
  • jQuery-Marquee only working in Firefox
  • shutdown and update job in Google Dataflow with PubSubIO + message guarantees
  • How do display a UIAlertView from a block on iOS?
  • Redirect response to download file
  • How to reply a tweet using the Twitter gem?
  • How many percent of the tweets does twitter sample API give?
  • How do I prepend to a stream in Bash?
  • Sending rails errors to rspec output
  • SSIS Designer is running VERY slowly
  • Why doesnt this Java loop in a thread work?
  • What is the difference between a “service account” and an “installed application”?
  • Access user's phone number on iOS 7
  • Clear fused location provider's location for testing
  • How to make R's read_csv2() recognise the text characters properly
  • Redshift Querying: error xx000 disk full redshift
  • Bash if statement with multiple conditions
  • Cannot upload to OneDrive using the new SDK
  • How to remove a SwiftyJSON element?
  • Jenkins: FATAL: Could not initialize class hudson.util.ProcessTree$UnixReflection
  • Adjust width of select element according to selected option's width
  • Should I or shouldn't I use the CachingConnectionFactory with hornetq 2.4.1
  • Date Conversion from yyyy-mm-dd to dd-mm-yyyy
  • Limiting recursion to certain level - Duplicate rows
  • Swift: Switch statement fallthrough behavior
  • How to attach a node.js readable stream to a Sendgrid email?
  • CSS Linear-gradient formatting issue accross different browsers
  • Ajax jQuery multiple calls at the same time - long wait for answer and not able to cancel
  • Knitr HTML Loop - Some HTML output, some R output
  • Hazelcast - OperationTimeoutException
  • Adding custom controls to a full screen movie
  • File upload with ng-file-upload throwing error
  • -fvisibility=hidden not passed by compiler for Debug builds
  • Django query for large number of relationships
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • How can I use `wmic` in a Windows PE script?
  • How to push additional view controllers onto NavigationController but keep the TabBar?