12270

preg_match easiest way to match text from inside html tags [duplicate]

<strong>Possible Duplicate:</strong> Best methods to parse HTML with PHP

for example i have a html code like :

<table width="100%" border="0" cellspacing="0" cellpadding="0" class="rowData"> <tr align="center" class="fnt-vrdana-mavi" > <td style="font-size:11px" colspan=3><b>Text text text</b>:3</td> </tr> <tr class="header" align="center"> <td height="18" colspan="3">Text text text</td> </tr> <tr align="center" class="fnt-vrdana" bgcolor="#eff3f4" height="18"> <td width="32%" height="17"><b>1</b></td> <td width="34%"><b>0</b></td> <td width="34%"><b>2</b></td> </tr> <tr align="center" class="fnt-vrdana-mavi"> <td height="17">2.90</td> <td>3.20</td> <td>1.85</td> </tr> </table>

Which is best regular expression to match all data from inside <td> tags?

Answer1:

I normally suggest if you need to actually express what you're looking for in a HTML document to use an xpath expression for that because it can give you the actual value whereas regex'es are not able to further parse the HTML/XML, and xpath expressions are much more fine-grained. See the output which returns the text-value for example w/o any further tags inside:

array(8) { [0]=> string(16) "Text text text:3" [1]=> string(14) "Text text text" [2]=> string(1) "1" [3]=> string(1) "0" [4]=> string(1) "2" [5]=> string(4) "2.90" [6]=> string(4) "3.20" [7]=> string(4) "1.85" }

Code:

$html = <<<EOD <table width="100%" border="0" cellspacing="0" cellpadding="0" class="rowData"> <tr align="center" class="fnt-vrdana-mavi" > <td style="font-size:11px" colspan=3><b>Text text text</b>:3</td> </tr> <tr class="header" align="center"> <td height="18" colspan="3">Text text text</td> </tr> <tr align="center" class="fnt-vrdana" bgcolor="#eff3f4" height="18"> <td width="32%" height="17"><b>1</b></td> <td width="34%"><b>0</b></td> <td width="34%"><b>2</b></td> </tr> <tr align="center" class="fnt-vrdana-mavi"> <td height="17">2.90</td> <td>3.20</td> <td>1.85</td> </tr> </table> EOD; // create DomDocument to operate xpath on $doc = new DomDocument; $doc->loadHTML($html); // create DomXPath $xpath = new DomXPath($doc); // perform the XPath query $nodes = $xpath->query('//td'); // process nodes to return their actual value $values = array(); foreach($nodes as $node) { $values[] = $node->nodeValue; } var_dump($values);

Answer2:

/<td.*?>(.*?)<\/td>/ would get all data between the <td> and </td>.

Getting the data from inside a <td> tag would be /<td([^>]*)>/ or /<td(.*?)>/

Recommend

  • Optimizing percolator queries in Elasticsearch
  • The script does not work in IE. How can I fix it?
  • jQuery: remove the closest with a dynamically added button
  • Bootstrap-datetimepicker - dynamically add shows up in wrong area
  • Submitting two different forms with an external Submit button not working properly
  • how to get username into sql trigger when multiple users signed on from asp membership
  • Kendo barchart category labels left and right based on value
  • Read text file that is not in the main package in a runnable jar
  • How do I superscript characters in a UIButton?
  • Moving Android View and preventing onDraw to be called over and over again
  • How can I sort a a table with VBA with given text condition?
  • How to Cache Real-time Data?
  • Abort upload large uploads after reading headers
  • Time complexity of a program which involves multiple variables
  • Custom validator control occupying space even though display set to dynamic
  • JSON response opens as a file, but I can't access it with JavaScript
  • Play WS (2.2.1): post/put large request
  • Set the selected item in dropdownlist in MVC3
  • Listbox within Listbox and scrolling trouble in Windows Phone 7 Silverlight
  • Atlas images wrong size on iPad iOS 9
  • Highlight and Bold text in JTextPane
  • Change multiple background-images with jQuery
  • Python CGI os.system causing malformed header
  • DomPDF {PAGE_NUM} not on first page
  • Xamarin Forms - UWP Fonts
  • Android screen density dpi vs ppi
  • AES padding and writing the ciphertext to a disk file
  • How to convert from System.Drawing.Color to Excel.ColorFormat in C#? Change comment color
  • Why doesn't :active or :focus work on text links in webkit? (safari & chrome)
  • DirectX11 ClearRenderTargetViewback with transparent buffer?
  • Validaiting emails with Net.Mail MailAddress
  • MySQL WHERE-condition in procedure ignored
  • How to apply VCL Styles to DLL-based forms in Inno Setup?
  • Change an a tag attribute in JavaScript based on screen width
  • jquery mobile loadPage not working
  • Web-crawler for facebook in python
  • Unanticipated behavior
  • How to delete a row from a dynamic generate table using jquery?
  • trying to dynamically update Highchart column chart but series undefined
  • java string with new operator and a literal