80946

Get the avaliable XPaths of an Html page?

Question:

I've taken and adapted <a href="http://bytes.com/topic/xml/answers/889001-all-available-xpath-xml-document" rel="nofollow">this</a> code of how to retrieve the XPath expressions of an XML document.

I Would like to do the same but using an html page to retrieve its avaliable XPaths ( maybe an HtmlDocument? ), is this possibly?

Note: I can accept a native solution or else using <a href="http://htmlagilitypack.codeplex.com/" rel="nofollow">HtmlAgilityPack</a> library.

This is the XML method:

''' <summary> ''' Gets all the XPath expressions of an XML Document. ''' </summary> ''' <param name="Document">Indicates the XML document.</param> ''' <returns>List(Of System.String).</returns> Public Function GetXPaths(ByVal Document As Xml.XmlDocument) As List(Of String) Dim XPathList As New List(Of String) Dim XPath As String = String.Empty For Each Child As Xml.XmlNode In Document.ChildNodes If Child.NodeType = Xml.XmlNodeType.Element Then GetXPaths(Child, XPathList, XPath) End If Next ' child Return XPathList End Function ''' <summary> ''' Gets all the XPath expressions of an XML Node. ''' </summary> ''' <param name="Node">Indicates the XML node.</param> ''' <param name="XPathList">Indicates a ByReffered XPath list as a <see cref="List(Of String)"/>.</param> ''' <param name="XPath">Indicates the current XPath.</param> Private Sub GetXPaths(ByVal Node As Xml.XmlNode, ByRef XPathList As List(Of String), Optional ByVal XPath As String = Nothing) XPath &= "/" & Node.Name If Not XPathList.Contains(XPath) Then XPathList.Add(XPath) End If For Each Child As Xml.XmlNode In Node.ChildNodes If Child.NodeType = Xml.XmlNodeType.Element Then GetXPaths(Child, XPathList, XPath) End If Next ' child End Sub

Answer1:

As far as I can see, HtmlAgilityPack has a very similar class structures to XmlDocument. So I believe you can easiliy adapt current solution to cope with HtmlDocument, something like this :

Public Function GetXPaths(ByVal Document As HtmlDocument) As List(Of String) Dim XPathList As New List(Of String) Dim XPath As String = String.Empty For Each Child As HtmlNode In Document.DocumentNode.ChildNodes If Child.NodeType = HtmlNodeType.Element Then GetXPaths(Child, XPathList, XPath) End If Next ' child' Return XPathList End Function Private Sub GetXPaths(ByVal Node As HtmlNode, ByRef XPathList As List(Of String), Optional ByVal XPath As String = Nothing) XPath &= "/" & Node.Name If Not XPathList.Contains(XPath) Then XPathList.Add(XPath) End If For Each Child As HtmlNode In Node.ChildNodes If Child.NodeType = HtmlNodeType.Element Then GetXPaths(Child, XPathList, XPath) End If Next ' child' End Sub

Worked fine when tested using HTML that is XML compliant. But I can't guarantee about how far this will work against malformed HTML documents.

Recommend

  • ELMAH: Can you set it up to email errors only remotely?
  • R h2o.glm - issue with max_active_predictors
  • Doctrine2 bulk import try to work with another entity
  • Passing “get” parameters doesn't work, parameter not visible in the link
  • Declaring variable dynamically in VB.net
  • How do I get HTML corresponding to current DOM tree?
  • Why Encoding.ASCII != ASCIIEncoding.Default in C#?
  • JQuery Internet Explorer and ajaxstop
  • JSON response opens as a file, but I can't access it with JavaScript
  • Is calc() supported in html email?
  • Submit form in a displaytag pagination
  • Validaiting emails with Net.Mail MailAddress
  • Fill an image in a square container while keeping aspect ratio
  • How to set my toolbar fixed while scrolling android
  • Change an a tag attribute in JavaScript based on screen width
  • Timeout for blocking function call, i.e., how to stop waiting for user input after X seconds?
  • Rearranging Cells in UITableView Bug & Saving Changes
  • AT Commands to Send SMS not working in Windows 8.1
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • Proper folder structure for lots of source files
  • JTable with a ScrollPane misbehaving
  • Turn off referential integrity in Derby? is it possible?
  • How does Linux kernel interrupt the application?
  • apache spark aggregate function using min value
  • Add sale price programmatically to product variations
  • unknown Exception android
  • Easiest way to encapsulate a HTML5 webpage into an android app?
  • Busy indicator not showing up in wpf window [duplicate]
  • Sorting a 2D array using the second column C++
  • costura.fody for a dll that references another dll
  • Reading document lines to the user (python)
  • Observable and ngFor in Angular 2
  • How to Embed XSL into XML
  • failed to connect to specific WiFi in android programmatically
  • UserPrincipal.Current returns apppool on IIS
  • Unable to use reactive element in my shiny app
  • Conditional In-Line CSS for IE and Others?
  • java string with new operator and a literal
  • How can I use threading to 'tick' a timer to be accessed by other threads?
  • How do I use LINQ to get all the Items that have a particular SubItem?