30136

How to get, in php, the entire html of a page loaded in part from jquery

i've this problem for days... I have to load from php the entire html of a page. On this page there is a jquery function that is called when all the page is loaded. This function loads other html into page, so i have to get all the html loaded ( the part loaded with jquery too). I can know that i get all the page trying to find some tag loaded only from jquery. ( for example: tag input with name XXX, tag input with attribute multiple, etc. )

so i try:

$html = file_get_contents("http://wwww.siteToScrape.com"); if (strpos($html, 'multiple') !== false) { echo 'found'; } else { echo 'not found'; }

but result is 'not found'.

Then i downloaded simple html dom and i try:

include 'simple_html_dom.php'; $html = file_get_html("http://wwww.siteToScrape.com"); if (strpos($html, 'multiple') !== false) { echo 'found'; } else { echo 'not found'; }

but result still remain 'not found'.

so i think to get some php script what emulate browser ( so can load jquery too ) and i downloaded PHP Scriptable Web Browser and i try:

require_once('browser.php'); $browser = new SimpleBrowser(); $p = $browser->get('http://wwww.siteToScrape.com'); if (strpos($p, 'multiple') !== false) { echo 'found'; } else { echo 'not found'; }

but result is still again 'not found'. I don't know how to do it.. can someone help me??? thanks!!!!

Answer1:

The problem is that you are trying to mix server and client.

PHP runs on the server Javascript (and therefor also jQuery) runs in the client browser.

There's no easy way to run the javascript using PHP. As far as I know, it's not even possible. Other languages, such as Java might be able to do what you are trying to do.

You should look at another way to do this.

This is also the reason why webcrawlers never gets affected by stuff you do using javascript. This is a nice thing to keep in mind when developing. Your dynamic loading will not be indexed by these crawlers at all.

Answer2:

As far as I know, this is not possible "with only PHP". Javascript runs on the client instead of the server and therefore it would not be possible without some sort of a browser emulator environment.

<strong>Edit:</strong> You could put javascript in the web page itself which would fetch the innerHTML of the whole web page after it was fully generated and then use an ajax call to send that to your server. You would have to stay within the limitations of the same-origin-policy (which doesn't allow you to make ajax calls to domains other than where the host web page came from).

Answer3:

Like the others have said, jquery is javascript, and is typically executed by the client (web browser) rather than the server.

PHP, being a server-side language, has no javascript interpreter.

The easiest way that I know of to run javascript using PHP is via web-testing tools, which often integrate a headless browser. You could check out mink, which has a back-end for the zombie node.js headless browser.

There's also the phantomjs headless browser with various PHP interfaces like this one, which I found with a quick google search.

In the more resource-intensive arena, there's also selenium, which has PHP interfaces as well.

Recommend

  • Ruby Mechanize not returning Javascript built page correctly
  • Avoid Multiple Next () Statement in Python Generator
  • Creating NPAPI plugin in Delphi and accessing exported APIs using javascript
  • Anyone using webtest without ant?
  • Testing iOS testing on real devices vs. Simulator
  • BlackBerry - Fun with FieldManagers
  • Upload file using PHP from Compute Engine to Cloud Storage
  • AngularJS: Returning a promise in directive template function
  • “ModuleNotFoundError: No module named ” in my Docker container
  • Displaying a list of videos from a channel - Vimeo Advanced API
  • Is it better to use the “hidden” CSS attribute or fetch each set of new images?
  • How to add closing tag for canvas in three js rendered Canvas?
  • Get the pasted content on document on paste event
  • PHP Listener Script for Paypal Webhooks
  • Dart - Isolate Cross Window Communication
  • Repository Browser Only - \"Repository moved permanently to… please relocate”
  • How can I restyle a word when rendering a pdf with pdf.js?
  • what makes a request a new request in asp.net C#
  • where do I find the xml.dom python package for the python-2.6.0-8.9.28 and I have a suse/x86_64 vers
  • Unable to install Git-core+svn by MacPorts
  • System.InvalidCastException: Specified cast is not valid
  • Django simple Captcha “No module named fields” error
  • Could not find rake using whenever rails
  • Read a local file using javascript
  • Reading JSON from a file using C++ REST SDK (Casablanca)
  • FB SDK and cURL: Unknown SSL protocol error in connection to graph.facebook.com:443
  • Apache 2.4 and php-fpm does not trigger apache http basic auth for php pages
  • Javascript + PHP Encryption with pidCrypt
  • Why is the timeout on a windows udp receive socket always 500ms longer than set by SO_RCVTIMEO?
  • Upload files with Ajax and Jquery
  • Do I've to free mysql result after storing it?
  • Is there a mandatory requirement to switch app.yaml?
  • A cron job substitute?
  • json Serialization in asp
  • Buffer size for converting unsigned long to string
  • using HTMLImports.whenReady not working in chrome
  • Hits per day in Google Big Query
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • How to get Windows thread pool to call class member function?
  • UserPrincipal.Current returns apppool on IIS