37082

Load dynamically generated HTML Code in WebClient

Question:

Well I am using the WebClient.DownloadString in order to scrap a webpage unfortunately the DownloadString gets me the page source without the CSS and JS updates (which are made in the internet explorer while page loads).

So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser control does ? (with the css and js code injections)

Answer1:

<blockquote>

So I was wondering how can I use WebClient to load the whole page the same way internet explorer or WebBrowser control does ?

</blockquote>

You can't do that. The WebClient class is used to download a <strong>SINGLE</strong> resource using the HTTP protocol. It doesn't understand the concept of HTML. If you need to download associated resources in this HTML you will have to use an HTML parser (such as <a href="http://htmlagilitypack.codeplex.com/" rel="nofollow">HTML Agility Pack</a> for example) and for each CSS and javascript you encounter in the downloaded HTML page, send another HTTP request with the WebClient to retrieve it.

But bear in mind that depending on the webpage you are trying to scrape things might get more complicated. For example the web page could have javascript which in turn dynamically references and includes other static resources such as javascript or CSS. A WebClient, since it doesn't execute javascript might never know about them.

Answer2:

The best solution for u is the ( <a href="https://htmlagilitypack.codeplex.com/" rel="nofollow">https://htmlagilitypack.codeplex.com/</a> ) , it will download for u all the content of the webapage , but i'm not sure if u can get the css+javascript code using this tool

Recommend

  • Using more than one WHERE condition in mysql query
  • How to write a parameterized SQL query?
  • PyCharm SQL Language Injection support
  • Formatting a String for a SQL IN Clause
  • Proper way of using LocationManager in Codename one
  • Is TWebBrowser dependant on IE version?
  • Raise Session_OnStart event from custom ASP.NET SessionStateProvider class
  • WebBrowser: IDropTarget
  • PHP problems with current url
  • How to only store 3 values for a key in a dictionary? Python
  • PHP: Get HTTP Protocol Version (HTTP/1.1 vs HTTP/2)
  • UIImagePickerControllerDelegate Methods Not Called When Delegate Not UIViewController
  • Base64 as method of sanitizing user input for Mysql
  • Center align outputs in ipython notebook
  • Adjust width of select element according to selected option's width
  • Installing Apache MyFaces 2 on WildFly 8.2.0
  • How to define and use opencv mat of user type
  • Using jQuery closest() method with class selector
  • Projection media query: browser support and workarounds?
  • Nant, Vault & Windows Integrated Authentication
  • Array.prototype.includes - not transformed with babel
  • req.body is undefined - nodejs
  • Bug in WPF DataGrid
  • Java applet as stand-alone Windows application?
  • Modifying destination and filename of gulp-svg-sprite
  • Importing jscolor library in angular 2
  • jQuery tmpl and DataLink beta
  • How can I estimate amount of memory left with calling System.gc()?
  • jqPlot EnhancedLegendRenderer plugin does not toggle series for Pie charts
  • Acquiring multiple attributes from .xml file in c#
  • Why can't I rebase on to an ancestor of source changesets if on a different branch?
  • CSS Applying specific rule for a specific monitor resolution with only CSS is posible?
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • Java static initializers and reflection
  • Change div Background jquery
  • How can I remove ASP.NET Designer.cs files?
  • unknown Exception android
  • Observable and ngFor in Angular 2
  • Unable to use reactive element in my shiny app
  • java string with new operator and a literal