34012

JSoup extract table as CSV from finance website

Question:

My problem is the following: I want to extract a table from an html file downloaded from a website using JSoup and return it as csv-file. (The data is historic stock prices).

Here is the website: <a href="http://www.finanzen.ch/kurse/historisch/Actelion/VIRTX/12.6.2013_17.9.2013" rel="nofollow">http://www.finanzen.ch/kurse/historisch/Actelion/VIRTX/12.6.2013_17.9.2013</a>

It is in German, so I hope this is no problem. I want to extract the table with all the numbers.

I have got the following code so far:

Document doc = Jsoup.connect("http://www.finanzen.ch/kurse/historisch/Actelion/VIRTX/12.6.2013_17.9.2013").get(); for (Element table : doc.select("table.Historische Kurse Actelion Ltd.*")) { for (Element row : table.select("tr")) { Elements tds = row.select("td"); if (tds.size() > 6) { System.out.println(tds.get(0).text() + ":" + tds.get(1).text()); } } }

I got this code from another StackOverflow article. The problem is I don't know anything about JSoup and I'm quite new to programming in Java. I would greatly appreciate your help.

Answer1:

Try this

import java.io.IOException; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class Test { public static void main(String[] args) { String url = "http://www.finanzen.ch/kurse/historisch/Actelion/VIRTX/12.6.2013_17.9.2013"; Document doc; try { doc = Jsoup.connect(url).get(); Element table = doc .select("div.mainwrapper div.main_background div.main_left") .get(0).child(3); Elements rows = table.select("tr"); Elements ths = rows.select("th"); String thstr = ""; for (Element th : ths) { thstr += th.text() + " "; } System.out.println(thstr); for (Element row : rows) { Elements tds = row.select("td"); for (Element td : tds) { System.out.println(td.text()); // --> This will print them // individually } System.out.println(tds.text()); // --> This will print everything // in the row } // System.out.println(table); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }

Recommend

  • Has anyone used Primus with websockets behind aws Elastic Load Balancer?
  • Check if all values are numeric over multiple columns and convert them to numeric
  • Get a localized name of the users city via Maxmind GeoLite2 Free
  • Codename One - addActionListener of a Picker
  • What should be the better way for localizing iOS project?
  • Load Resources for other Language
  • JSoup extract table as CSV from finance website
  • wordpress: can't translate one term with loco translate plugin
  • PHP: remove small words from string ignoring german characters in the words
  • Check for domain extension with PHP or JS
  • Call order of constructors
  • Call a java function from matlab script
  • Trying to generate url slugs with PHP regex, Japanese characters not going through
  • Access MS-Word spell-checker from (unmanaged) C++
  • Sublime only opens in Sublimerge view
  • Can Adobe AIR applications achieve SSO authentication against Active Directory?
  • localizable.strings - works in simulator but not on device
  • model.solve() method is not working (“CPLEX DLL not found”) for DOcplex for Python
  • Inconsistent date time format for German locale
  • Jquery autocomplete with php as remote source: how to append second variable from input to source
  • in batch how do i use taskkill properly
  • Clear activity stack before launching another activity
  • How do I get HTML corresponding to current DOM tree?
  • JQuery Internet Explorer and ajaxstop
  • How to attach a node.js readable stream to a Sendgrid email?
  • JSON response opens as a file, but I can't access it with JavaScript
  • Refering to the class itself from within a class mehod in Objective C
  • PostgreSQL Query without WHERE only ORDER BY and LIMIT doesn't use index
  • formatting the colorbar ticklabels with SymLogNorm normalization in matplotlib
  • Splitting given String into two variables - php
  • NetLogo BehaviorSpace - Measure runs using reporters
  • Is my CUDA kernel really runs on device or is being mistekenly executed by host in emulation?
  • JSON with duplicate key names losing information when parsed
  • Change an a tag attribute in JavaScript based on screen width
  • Build own AppleScript numerical error handling
  • Acquiring multiple attributes from .xml file in c#
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • How can I remove ASP.NET Designer.cs files?
  • How to Embed XSL into XML
  • java string with new operator and a literal