88315

Possible to get HtmlNode's position & length within original input?

Consider the following HTML fragment (_ is used for whitespace):

<head> ... <link ... ___/> <!-- ... --> ... </head>

I'm using Html Agility Pack (HAP) to read HTML files/fragments and to strip out links. What I want to do is find the LINK (and some other) elements and then replace them with whitespace, like so:

<head> ... ____________ <!-- ... --> ... </head>

The parsing part seems to be working so far, I get the nodes I'm looking for. However, HAP tries to fix the HTML content while I need everything to be exactly the same, except for the changes I'm trying to make. Plus, HAP seems to have quite a few bugs when it comes to writing back content that was read in previously, so the approach I want to take is let HAP parse the input and then I go back to the original input and replace content that I don't want.

The problem is, HtmlNode doesn't seem to have an input length property. It has StreamPosition which seems to indicate where reading of the node's content started within the input but I couldn't find a length property that'd tell me how many characters were consumed to build the node.

I tried using the OuterHtml propety but, unfortunately, HAP tries to fix the LINK by removing the ___/ part (a LINK element is not supposed to be closed). Because of this, OuterHtml.Length returns the wrong length.

Is there a way in HAP to get this information?

Answer1:

I ended up modifying the code of HtmlAgilityPack to expose a new property that returns the private _outerlength field of HtmlNode.

public virtual int OuterLength { get { return ( _outerlength ); } }

This seems to be working fine so far.

Answer2:

If you want to achieve the same result without recompiling HAP, then use reflection to access the private variable.

I usually wouldn't recommend reflection to access private variables, but I recently had the exact same situation as this and used reflection, because I was unable to use a recompiled version of the assembly. To do this, create a static variable that holds the field info object (to avoid recreating it on every use):

private static readonly FieldInfo HtmlNodeOuterLengthFieldInfo = typeof(HtmlNode).GetField("_outerlength", BindingFlags.NonPublic | BindingFlags.Instance);

Then whenever you want to access the true length of the original outer HTML:

var match = htmlDocument.DocumentNode.SelectSingleNode("xpath"); var htmlLength = (int)HtmlNodeOuterLengthFieldInfo.GetValue(match);

Recommend

  • HtmlAgilityPack: How to check if an element is visible?
  • how can I remove with specific tags from html [duplicate]
  • MIPS: Write AND read a file
  • CKEditor on appended textarea
  • Kotlin coroutine can't handle exception
  • HTML Agility Pack get all input fields
  • Android performance: cost of SharedPreferences
  • How to apply implicit conversions between tuples?
  • How Django's url template tag works?
  • How to plot points around a circle in R
  • sinonjs - advance clock to 59 minutes and wait for 1 minute actually
  • Angularjs passing values
  • php email sending script not sending email
  • get all the divs ids on a html page using Html Agility Pack
  • Unexpected behavior with exception handling in async, possible bug?
  • Ambiguous overload on template operators
  • File Not Found Error in Python
  • How to model a mixture of finite components from different parametric families with JAGS?
  • Safari PHP form submission -file upload hangs
  • using System.Speech.Synthesis with Windows10 universal app (XAML-C#)
  • Who propagate bugfixes across branches (corporate development)?
  • How to make JSON.NET deserialize to Microsoft Date Time?
  • xcode don't localize specific strings
  • How to get current document uri in XSLT?
  • SharedPreferences or SQLite Database?
  • ilmerge with a PFX file
  • How do I fake an specific browser client when using Java's Net library?
  • How to get a value (ex: baseURL) in every Karate feature?
  • Join two tables and save into third-sql
  • How to model a transition system with SPIN
  • what is the difference between the asp.net mvc application and asp.net web application
  • ORA-29908: missing primary invocation for ancillary operator
  • Matrix multiplication with MKL
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • Change div Background jquery
  • apache spark aggregate function using min value
  • Checking variable from a different class in C#
  • Running Map reduces the dimensions of the matrices
  • Binding checkboxes to object values in AngularJs
  • Converting MP3 duration time