77319

Retrieve the contents of a div from external site (PHP, XPATH)?

I am trying to retrieve and echo the content of a div from an external site using PHP and xPath.

This is an excerpt from the page, showing the relevant code:

<html xml:lang="en" lang="en" xmlns="http://www.w3.org/1999/xhtml"> <head><title>Handbags - Clutches - Kara Ross New York</title></head> <body> <div id="Container"> <div id="AjaxLoading">...</div> ... <div id="Wrapper"> <div class="productlist-page"> ... <div class="Content Wide " id="LayoutColumn1"> ... <div align="center"> <div class="Block CategoryContent Moveable Panel" id="CategoryContent"> <form name="frmCompare" id="frmCompare"> <table><tr><td valign="top">...</td> <td valign="top"> <ul class="ProductList "> <li class="Odd"> <div class="ProductImage QuickView" data-product="261"> <a href="http://www.kararossny.com/electra-clutch-in-oil-spill-lizard-and-hologram-with-gunmetal-hardware-and-hematite/"> <img src="http://cdn2.bigcommerce.com/n-arxsrf/t0qdc/products/261/images/1382/electra_oil_spill__08182.1402652812.500.375.jpg?c=2" alt="Kara Ross Electra Clutch in Oil Spill Lizard and Hologram with Gunmetal Hardware and Hematite Gemstone on Closure"/> </a> </div> <div class="ProductDetails">...</div> <div class="ProductPriceRating">...</div> <div class="ProductCompareButton" style="display:none">...</div> <div class="ProductActionAdd" style="display:none;">...</div> </li> </ul> </td> <td valign="top" align="center">...</td> </tr> </table> <div class="product-nav btm"> ... </div> </form> ...

This is my code so far:

$url = 'http://www.kararossny.com/clutches/?sort=featured&page=1'; $dom = new DOMDocument; @$dom->loadHTMLFile($url); $xpath = new DOMXpath($dom); $elements = $xpath->query('//div[class="ProductImage QuickView"]'); foreach($elements[0] as $child) { echo $child . "\n"; }

My desired output for the page linked would be:

<a href="http://www.kararossny.com/electra-clutch-in-oil-spill-lizard-and-hologram-with-gunmetal-hardware-and-hematite/"> <img src="http://cdn2.bigcommerce.com/n-arxsrf/t0qdc/products/261/images/1382/electra_oil_spill__08182.1402652812.500.375.jpg?c=2" alt="Kara Ross Electra Clutch in Oil Spill Lizard and Hologram with Gunmetal Hardware and Hematite Gemstone on Closure"/> </a>

Any idea what I am doing wrong? I think my xpath might be wrong, but I am not sure.

Thanks!

Answer1:

There are three reasons why you are probably not being able to select the code you want.

1 - To select your class attribute in your XPath predicate you need to use the attribute axis. Either prefix the attribute name with attribute:: or with an @ sign. So you should use

@class

to select the class attribute.

2 - An XPath expression is made of one or more steps. Each step defines a context that limits the scope of the next step. The last step contains the set you are selecting. Since your last step is a div, you are actually selecting a div, and not an a. You should use the following expression to select the a node and its contents:

//div[@class="ProductImage QuickView"]/a


3 - Finally, your page has a default namespace declaration:

xmlns="http://www.w3.org/1999/xhtml"

That will require you to either register it or ignore it selecting your elements using wildcards (not by their names, but using *). Most XPath APIs do not automatically set default namespaces, and if a namespace is not used to qualify XPath selectors, it considers unprefixed elements as belonging to no namespaces. That means that if you try to select a <div> using the expression //div, you may get an empty set. If you are not selecting anything, try ignoring namespaces like this:

//*[local-name()='div'][@class="ProductImage QuickView"]/*[local-name()='a']

    

Answer2:

You forgot to add @ on the class and a at the end on your query, since to targeting the link. After that, use saveHTML() to get it. Consider this example:

$url = 'http://www.kararossny.com/clutches/?sort=featured&page=1'; $dom = new DOMDocument(); @$dom->loadHTMLFile($url); $xpath = new DOMXpath($dom); $elements = $xpath->query('//div[@class="ProductImage QuickView"]/a'); $link = $dom->saveHTML($elements->item(0)); echo $link;

Answer3:

Yes, your XPath is a bit off.

In XPath, to filter element by it's attribute value you have to use @ at the beginning of the attribute name. So your XPath should've been as follow :

//div[@class="ProductImage QuickView"]

Recommend

  • RichText in Magnolia CMS is changing HTML text
  • Airdrop: making a custom URL scheme be less ugly for recipient
  • How to pass data between ArrayLists?
  • Azure web site - URL Rewrite using web.config not working
  • Calculating nth Roots of Unity in Python
  • Pagnation not updating target - Asp.Net Core using Ajax
  • Loading image from projects files
  • Rails 4 order by virtual attribute
  • How to Have a Bundled Configurable Product in Magento?
  • Ruby: Invert a hash to also preserve non unique values
  • Specify ivy configuration in gradle dependency
  • Retrieve purchased information in In-App purchase
  • How to fix this floating point square root algorithm
  • how to display data by using AJAX?
  • OleDBConnection Connection string
  • Automating table/object name scan and search in SAS
  • Using LINQ with IBM i
  • ORA-12154: TNS:could not resolve the connect identifier specified
  • Why is it still possible to insert a foreign key that doesn't exist?
  • Enable Bootstrap Intellisense using Angular 4 in WebStorm
  • Download/Save/Write a file on the client's hard disk using flash/flex
  • Shopify API CARTS - Changing line_item line_price for price Override
  • How to get real device model in Android?
  • Unique Permutations - with exceptions
  • Returning the auto incrementing value after an insert using slick
  • Xaml, wpf image position and crop issue
  • Request response issues in biztalk
  • Magento get URL before current
  • Position: fixed nav does not stay fixed
  • How do I pass the string value parameter of the selected list item from an auto-populated dropdown l
  • MongoDB in PHP using aggregate to group by _id is null not working
  • Using jQuery closest() method with class selector
  • Jquery - Jquery Wysiwyg return html as a string
  • XCode can't find symbols for a specific iOS library/framework project
  • Getting Messege Twice Using IMvxMessenger
  • embed rChart in Markdown
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • Observable and ngFor in Angular 2
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app