6752

not iterating the list in web scraping

From a link , I am trying to create two lists: one for country and the other for currency. However, I'm stuck at some point where it only gives me the first country name but doesn't iterate to list of all countries. Any help as to how I can fix this will be appreciated.Thanks in advance.

Here is my try:

from bs4 import BeautifulSoup import urllib.request url = "http://www.worldatlas.com/aatlas/infopage/currency.htm" headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36'} req = urllib.request.Request(url, headers=headers) resp = urllib.request.urlopen(req) html = resp.read() soup = BeautifulSoup(html, "html.parser") attr = {"class" : "miscTxt"} countries = soup.find_all("div", attrs=attr) countries_list = [tr.td.string for tr in countries] for country in countries_list: print(country)

Answer1:

Try this script. It should give you the country names along with corresponding currencies. You didn't require to use headers for this site.

from bs4 import BeautifulSoup import urllib.request url = "http://www.worldatlas.com/aatlas/infopage/currency.htm" resp = urllib.request.urlopen(urllib.request.Request(url)).read() soup = BeautifulSoup(resp, "lxml") for item in soup.select("table tr"): try: country = item.select("td")[0].text.strip() except IndexError: country = "" try: currency = item.select("td")[0].find_next_sibling().text.strip() except IndexError: currency = "" print(country,currency)

Partial Output:

Afghanistan afghani Algeria dinar Andorra euro Argentina peso Australia dollar

Answer2:

You can also use a single comprehension list to make a list of tuples like [(country, currency)] & then convert the tuples to 2 lists with map & zip :

temp_list = [ (t[0].text.strip(), t[1].text.strip()) for t in (t.find_all('td') for t in countries[0].find_all('tr')) if t ] countries_list, currency_list = map(list,zip(*temp_list))

The full code :

from bs4 import BeautifulSoup import urllib.request req = urllib.request.Request("http://www.worldatlas.com/aatlas/infopage/currency.htm") soup = BeautifulSoup(urllib.request.urlopen(req).read(), "html.parser") countries = soup.find_all("div", attrs = {"class" : "miscTxt"}) temp_list = [ (t[0].text.strip(), t[1].text.strip()) for t in (t.find_all('td') for t in countries[0].find_all('tr')) if t ] countries_list, currency_list = map(list,zip(*temp_list)) print(countries_list) print(currency_list)

Recommend

  • Status 500 using requests in Python
  • XMLHttpRequest mimicking script works on one web page, but not another
  • Chrome on iOS 8 useragent no longer include crIOS
  • C# htmlagility, getting exception when i add header in following code
  • Webkit Bug? Broken CSS
  • Classification of Blackberry devices by using useragent
  • Orange Python data load error: “example of invalid length”
  • Setting JSON request header in Angular2 HTTP POST
  • JSON not generated in using Jersey
  • Nightmare / Electron : Navigation Error (code - 118)
  • how to find function boundaries in binary code
  • Atomic max for floats in OpenCL
  • cURL timeout when calling HTTPS page with old certs
  • Adding a field to a structured numpy array (3)
  • How do i compile .py to a .exe?
  • How can I escape backslash in logstash grok pattern?
  • TFS Builds: Running the builds as administrator
  • JENKINS: ERROR when I try to use an older JDK for a specific maven project
  • how to force the use of cmov in gcc and VS
  • Implementing HMAC-SHA256 for Keybase in Javascript
  • Objective C - Create a framework for my iphone apps?
  • use rvest and css selector to extract table from scraped search results
  • HALF_PTR Windows data type
  • how to display   in Mozilla using XSL.
  • Which browser have this strange user agent? (IOS device)
  • Varnish/Apache Random 503 Errors
  • python mysqldb delete row
  • MeeGo Handset Emulator not starting on Windows 7
  • How to make jdk.nashorn.api.scripting.JSObject visible in plugin [duplicate]
  • Install PHP intl extension on MacOS
  • VSO Build — Response status code does not indicate success: 404 (Not Found)
  • How do I fake an specific browser client when using Java's Net library?
  • Apache 2.4 - remove | delete | uninstall
  • Cannot Parse HTML Data Using Android / JSOUP
  • How do you join a server to an Active Directory (domain)?
  • Understanding cpu registers
  • Authorize attributes not working in MVC 4
  • Busy indicator not showing up in wpf window [duplicate]
  • Python/Django TangoWithDjango Models and Databases
  • Net Present Value in Excel for Grouped Recurring CF