89484

Wikipedia with Python

I have this very simple python code to read xml for the wikipedia api:

import urllib from xml.dom import minidom usock = urllib.urlopen("http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500") xmldoc=minidom.parse(usock) usock.close() print xmldoc.toxml()

But this code returns with these errors:

Traceback (most recent call last): File "/home/user/workspace/wikipediafoundations/src/list.py", line 5, in <module><br> xmldoc=minidom.parse(usock)<br> File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse<br> return expatbuilder.parse(file)<br> File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 928, in parse<br> result = builder.parseFile(file)<br> File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile<br> parser.Parse(buffer, 0)<br> xml.parsers.expat.ExpatError: syntax error: line 1, column 62<br>

I have no clue as I just learning python. Is there a way to get an error with more detail? Does anyone know the solution? Also, please recommend a better language to do this in.

Thank You, Venkat Rao

Answer1:

The URL you're requesting is an HTML representation of the XML that would be returned:

http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500

So the XML parser fails. You can see this by pasting the above in a browser. Try adding a format=xml at the end:

http://en.wikipedia.org/w/api.php?action=query&titles=Fractal&prop=links&pllimit=500&format=xml

as documented on the linked page:

Recommend

  • Setting variable to result of function acting very strange
  • How to exclude null properties when using XmlSerializer
  • Java XStream Deep Copy raises Exception ObjectAccessException
  • how to RSSFeeds from Multiple Websites
  • Could not find or load assembly \"tmpAssembly,
  • How can I detect a hardware-button-press from within an Android Wear watch face?
  • Treetop basic parsing and regular expression usage
  • How to move again MainActivity after sending the mail?
  • No such module 'Parse' following Parse iOS Swift Quickstart guide
  • Special chars in Amazon S3 keys?
  • How to use : function in H2O ddply, R?
  • Is mp4 stream able with ffserver?
  • Jenkins: could not create Android emulator failed to parse AVD config file
  • Python/Javascript: WYSIWYG html editor - Handle large documents fast and/or design theory
  • IE10 strips out hashtag from the URL
  • What's the syntax to inherit documentation from another indexer?
  • Struts 2 TextField Tag with an attribute and no value
  • Android Google Maps API v2 start navigation
  • why xml file does not aligned properly after append the string in beginning and end of the file usin
  • Date Conversion from yyyy-mm-dd to dd-mm-yyyy
  • Make VS2015 use angular-cli ng at build time in a .NET project
  • Dialing with Intent.ACTION_CALL stopps at # in phone number
  • Is there a javascript serializer for JSON.Net?
  • Java Scanner input dilemma. Automatically inputs without allowing user to type
  • Master page gives error
  • C# - Is there a limit to the size of an httpWebRequest stream?
  • Optimizing database types to compact database (SQLite)
  • TFS: Get latest causes slow project reloading
  • Fill an image in a square container while keeping aspect ratio
  • Running a C# exe file
  • Where to put my custom functions in Wordpress?
  • Rearranging Cells in UITableView Bug & Saving Changes
  • align graphs with different xlab
  • Return words with double consecutive letters
  • Numpy divide by zero. Why?
  • Windows forms listbox.selecteditem displaying “System.Data.DataRowView” instead of actual value
  • Can't mass-assign protected attributes when import data from csv file
  • sending mail using smtp is too slow
  • Reading document lines to the user (python)
  • Python/Django TangoWithDjango Models and Databases