How to get, in php, the entire html of a page loaded in part from jquery

i've this problem for days... I have to load from php the entire html of a page. On this page there is a jquery function that is called when all the page is loaded. This function loads other html into page, so i have to get all the html loaded ( the part loaded with jquery too). I can know that i get all the page trying to find some tag loaded only from jquery. ( for example: tag input with name XXX, tag input with attribute multiple, etc. )

so i try:

$html = file_get_contents("http://wwww.siteToScrape.com"); if (strpos($html, 'multiple') !== false) { echo 'found'; } else { echo 'not found'; }

but result is 'not found'.

Then i downloaded simple html dom and i try:

include 'simple_html_dom.php'; $html = file_get_html("http://wwww.siteToScrape.com"); if (strpos($html, 'multiple') !== false) { echo 'found'; } else { echo 'not found'; }

but result still remain 'not found'.

so i think to get some php script what emulate browser ( so can load jquery too ) and i downloaded PHP Scriptable Web Browser and i try:

require_once('browser.php'); $browser = new SimpleBrowser(); $p = $browser->get('http://wwww.siteToScrape.com'); if (strpos($p, 'multiple') !== false) { echo 'found'; } else { echo 'not found'; }

but result is still again 'not found'. I don't know how to do it.. can someone help me??? thanks!!!!

Answer1:

The problem is that you are trying to mix server and client.

PHP runs on the server Javascript (and therefor also jQuery) runs in the client browser.

There's no easy way to run the javascript using PHP. As far as I know, it's not even possible. Other languages, such as Java might be able to do what you are trying to do.

You should look at another way to do this.

This is also the reason why webcrawlers never gets affected by stuff you do using javascript. This is a nice thing to keep in mind when developing. Your dynamic loading will not be indexed by these crawlers at all.

Answer2:

As far as I know, this is not possible "with only PHP". Javascript runs on the client instead of the server and therefore it would not be possible without some sort of a browser emulator environment.

<strong>Edit:</strong> You could put javascript in the web page itself which would fetch the innerHTML of the whole web page after it was fully generated and then use an ajax call to send that to your server. You would have to stay within the limitations of the same-origin-policy (which doesn't allow you to make ajax calls to domains other than where the host web page came from).

Answer3:

Like the others have said, jquery is javascript, and is typically executed by the client (web browser) rather than the server.

PHP, being a server-side language, has no javascript interpreter.

The easiest way that I know of to run javascript using PHP is via web-testing tools, which often integrate a headless browser. You could check out mink, which has a back-end for the zombie node.js headless browser.

There's also the phantomjs headless browser with various PHP interfaces like this one, which I found with a quick google search.

In the more resource-intensive arena, there's also selenium, which has PHP interfaces as well.

人吐槽 人点赞

Recommend

Comment

用户名: 密码:
验证码: 匿名发表

你可以使用这些语言

查看评论:How to get, in php, the entire html of a page loaded in part from jquery