33563

XPath search through HTML tags

Question:

The following HTML shows the 3rd search (search for "Practice Guidelines Professional") does not work as the text "Practice Guidelines" is placed between the <strong></strong> tag... Is it possible to achieve XPath search to bypass HTML tags between the texts?

<html> <head> <meta http-equiv="X-UA-Compatible" content="chrome=1"> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta name="apple-mobile-web-app-capable" content="yes"> <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no" /> <title>Xpath Test</title> <script type="text/javascript"> var search = function (button) { var searchTxt = button.value; var xpath = '//*[contains(translate(text(),\'ABCDEFGHIJKLMNOPQRSTUVWXYZ\', \'abcdefghijklmnopqrstuvwxyz\'), \'' + searchTxt.toLowerCase() + '\')]'; var nodes = null; if (document.evaluate) { nodes = document.evaluate( xpath, document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null ); var node = nodes.iterateNext (); if (node) { while (node) { var nodeXpath = createXPathFromElement(node); alert("Found!!!\n xpath: " + nodeXpath); node = nodes.iterateNext (); } } else { alert("No results found."); } } else { alert("Your browser does not support the evaluate method!"); } } function createXPathFromElement(elm) { var allNodes = document.getElementsByTagName('*'); for (segs = []; elm && elm.nodeType == 1; elm = elm.parentNode) { for (i = 1, sib = elm.previousSibling; sib; sib = sib.previousSibling) { if (sib.localName == elm.localName) i++; }; segs.unshift(elm.localName.toLowerCase() + '[' + i + ']'); }; return segs.length ? '//' + segs.join('//') : null; } </script> </head> <body> <table> <tr> <td>

&#160;

<h3><strong><span class="color-1">PRINCIPLES OF PATIENT CARE</span></strong></h3><p class="noindent"><strong><span class="color-1">Evidence-Based Medicine</span></strong> Evidence-based medicine refers to the concept that clinical decisions are formally supported by data, preferably data that are derived from prospectively designed, randomized, controlled clinical trials. This is in sharp contrast to anecdotal experience, which may often be biased. Unless they are attuned to the importance of using larger, more objective studies for making decisions, even the most experienced physicians can be influenced by recent encounters with selected patients. Evidence-based medicine has become an increasingly important part of the routine practice of medicine and has led to the publication of a number of practice guidelines.

&#160;

<p class="noindent"><strong><span class="color-1">Practice Guidelines</span></strong> Professional organizations and government agencies are developing formal clinical-practice guidelines to aid physicians and other caregivers in making diagnostic and therapeutic decisions that are evidence-based, cost-effective, and most appropriate to a particular patient and clinical situation. As the evidence base of medicine increases, guidelines can provide a useful framework for managing patients with particular diagnoses or symptoms. They can protect patients—particularly those with inadequate health care benefits—from receiving substandard care. Guidelines can also protect conscientious caregivers from inappropriate charges of malpractice and society from the excessive costs associated with the overuse of medical resources. There are, however, caveats associated with clinical practice guidelines since they tend to oversimplify the complexities of medicine. Furthermore, groups with differing perspectives may develop divergent recommendations regarding issues as basic as the need for periodic sigmoidoscopy in middle-aged persons. Finally, guidelines do not—and cannot be expected to—account for the uniqueness of each individual and his or her illness. The physician’s challenge is to integrate into clinical practice the useful recommendations offered by experts without accepting them blindly or being inappropriately constrained by them.

</td> </tr> <tr> <td> Search for <input type="button" onclick="search(this)" value="Practice Guidelines" /> Success. </td> </tr> <tr> <td> Search for <input type="button" onclick="search(this)" value="Professional" /> Success. </td> </tr> <tr> <td> Search for <input type="button" onclick="search(this)" value="Practice Guidelines Professional" /> No result since text 'Practice Guidelines' is inside &lt;strong&gt;&lt;strong/&gt;, and 'Professional' is out side. </td> </tr> </table> </body> </html>

Answer1:

Yes. You can use <a href="http://www.w3.org/TR/xpath/#function-normalize-space" rel="nofollow">normalize-space()</a> for this. <em>(line-wrapped for the sake of readability)</em>

var xpath = " //*[ contains( translate( normalize-space(.), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz' ), normalize-space('" + searchTxt.toLowerCase() + "') ) ] ";

Note that this will find the <em>entire hierarchy</em> of nodes that contain the respective text, up to and including the <html> element. You would be interested in the "most deeply nested" node, I guess.

Also note that you might want to remove single quotes from searchTxt as they would break the XPath expression.

Answer2:

There is probably also a recursive method that looks at the textContent/innerHTML of a root element to see if a match is found. If so, go down firstChild nodes that match and so on, following children until the match fails, at which point the element containing the text has been found.

Should also work for multiple matches with the text.

XPath might be faster, but perhaps not as wildely supported (though it seems to be on all modern browsers).

Recommend

  • nextSibling doesn't work when working with PHP DOMDocument
  • Comma-formated numbers in jQuery
  • How to convert XMLList to XML in FLEX
  • Perl: Pulling pairs of values from an array
  • XML SAX Parser not working - NullPointerException [closed]
  • Get the full path of a node, after get it with an XPath query in JavaScript
  • How to use xmlreader to read this xml
  • RSpec and ActiveModel
  • Get id that starts with some string, inside a jQuery loop
  • Click button with javascript
  • Play Framework nested form errors missing
  • Get css animation property with jQuery
  • How can I have equal heights for inner elements of flexbox grid/boxes/cards without using jQuery?
  • iOS - MKOverlayView custom view rect fills works, but line draws do not
  • Showing spinner for Rails 3 UJS gets Type error
  • changes in jquery 1.4.2 breaking the code?
  • text-align justify, cannot override
  • Angular - routerLinkActive and queryParams handling
  • Is it possible to get the word under the mouse cursor in a ``?
  • Alternative To body {overflow:scroll;} That Will Prevent Page Jostling/Wriggling?
  • Play WS (2.2.1): post/put large request
  • Django: Count of Group Elements
  • Insert into database using onclick function
  • How to check if every primary key value is being referenced as foreign key in another table
  • How to handle AllServersUnavailable Exception
  • Web-crawler for facebook in python
  • How to get next/previous record number?
  • Warning: Can't call setState (or forceUpdate) on an unmounted component
  • retrieve vertices with no linked edge in arangodb
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • Change div Background jquery
  • apache spark aggregate function using min value
  • unknown Exception android
  • Django query for large number of relationships
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • Observable and ngFor in Angular 2
  • How to Embed XSL into XML
  • How can I use `wmic` in a Windows PE script?
  • Unable to use reactive element in my shiny app
  • How to push additional view controllers onto NavigationController but keep the TabBar?