26013

Issue trying to use PhantomJS to process a web page

Question:

I'm trying to make a crawler for SEO purposes, and I can't seem to get PhantomJS to at least download this particular page: <a href="https://tablet.euroslots.com/home/" rel="nofollow">https://tablet.euroslots.com/home/</a>

If I use cURL it works fine (but obviously doesn't process the javascript):

✓ 1344:0 /cherrytech/js-crawler root› curl https://tablet.euroslots.com/home/ <!doctype html><!--[if lt IE 7]><html class="no-js lt-ie9 lt-ie8 lt-ie7"> ...

My PhantomJS script:

var page = require('webpage').create(); page.onResourceRequested = function (request) { console.log('Request ' + JSON.stringify(request, undefined, 4)); }; page.onResourceReceived = function(response) { console.log('Response (#' + response.id + ', stage "' + response.stage + '"): ' + JSON.stringify(response)); }; page.onResourceError = function(resourceError) { console.log('Unable to load resource (#' + resourceError.id + 'URL:' + resourceError.url + ')'); console.log('Error code: ' + resourceError.errorCode + '. Description: ' + resourceError.errorString); }; page.settings.userAgent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A406 Safari/8536.25'; page.open('https://tablet.euroslots.com/home/', function() { console.log(page.content); phantom.exit(); });

And this is the result of running it:

✓ 1347:0 /cherrytech/js-crawler root› phantomjs crawler.js Request { "headers": [ { "name": "User-Agent", "value": "Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A406 Safari/8536.25" }, { "name": "Accept", "value": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" } ], "id": 1, "method": "GET", "time": "2014-09-16T16:02:24.426Z", "url": "https://tablet.euroslots.com/home/" } Unable to load resource (#1URL:https://tablet.euroslots.com/home/) Error code: 2. Description: Connection closed Response (#1, stage "end"): {"contentType":null,"headers":[],"id":1,"redirectURL":null,"stage":"end","status":null,"statusText":null,"time":"2014-09-16T16:02:24.763Z","url":"https://tablet.euroslots.com/home/"} <html><head></head><body></body></html>

Answer1:

Try calling phantomjs with --ssl-protocol=any

I had the same exact problem, with an external site that worked one week ago.

So I searched, and found a related issue described at <a href="https://stackoverflow.com/questions/15063824/qt-qnetworkreply-connection-closed" rel="nofollow">Qt QNetworkReply connection closed</a>. It helped me look into the phantomjs' embedded Qt: it defaults to forcing new connections in SSLv3, which is either too new for old sites, or too old for new sites (but was quite a reasonable default at the time Qt 4.8.4 was released).

With "any", you tell phantomjs to try all protocols, which should help you pass the test. It will try more-secure-than-SSLv3 protocols, but less-secure-than-SSLv3 too (SSLv3 is at middle range). So, if "any" works, you should then try to force a more-secure-than-SSLv3 value instead of letting "any". In my case, specifying --ssl-protocol=tlsv1 worked.

Guess that the recent issues with SSL (goto fail, heartbleed, poodle, and so on) made a whole lot of websites upgrade their servers, now refusing SSLv3 connections. But in case your server uses an older-than-SSLv3 protocol, keep the "any" (and all the security risks associated…).

Recommend

  • fread issue with Stream Context
  • What happens to the signal/slot connection if the pointer is pointing to a new object
  • using variables in regular expression (c#) [duplicate]
  • Apple Push Notifications: Not receiving device token?
  • Manually set Validation error on Silverlight control
  • “Use of uninitialized value $_” warning with a Mojo::UserAgent non-blocking request
  • cURL Html output different from original page when rendered
  • Unix shell script to search for error codes in thousand files then print the count in text file
  • How to create Json object from String containing characters like ':' ,'[' and &#
  • Refused: not authorized error occurs with IBM IoT Foundation on Bluemix
  • How can I update my Twitter status with Perl and only LWP::UserAgent?
  • update record in database using jdatabase
  • use rvest and css selector to extract table from scraped search results
  • how to display   in Mozilla using XSL.
  • Varnish/Apache Random 503 Errors
  • How to make jdk.nashorn.api.scripting.JSObject visible in plugin [duplicate]
  • How can the INSERT … ON CONFLICT (id) DO UPDATE… syntax be used with a sequence ID?
  • Visual Studio 2010 debugger build correctly - compiler pdb and linker pdb not in synch?
  • How to get Eclipse Oxygen to run on Java 9
  • QLPreviewController hide print button in ios6
  • MailKit: The IMAP server replied to the 'EXAMINE' command with a 'BAD' response
  • Seeking advice on Jetty HttpClient Hang
  • Why is an OPTIONS request sent to the server?
  • Spring security and special characters
  • Uncaught Error: Could not find module `ember-load-initializers`
  • Can a Chrome extension content script make an jQuery AJAX request for an html file that is itself a
  • XCode can't find symbols for a specific iOS library/framework project
  • Circular dependency while pushing http interceptor
  • Revoking OAuth Access Token Results in 404 Not Found
  • AngularJs get employee from factory
  • Acquiring multiple attributes from .xml file in c#
  • How to set the response of a form post action to a iframe source?
  • Hits per day in Google Big Query
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • Change div Background jquery
  • Linking SubReports Without LinkChild/LinkMaster
  • How can I remove ASP.NET Designer.cs files?
  • XCode 8, some methods disappeared ? ex: layoutAttributesClass() -> AnyClass
  • java string with new operator and a literal