33886

using minidom to parse xml

Question:

Hi I have trouble understanding the minidom module for Python.

I have xml that looks like this:

<Show> <name>Dexter</name> <totalseasons>7</totalseasons> <Episodelist> <Season no="1"> <episode> <epnum>1</epnum> <seasonnum>01</seasonnum> <prodnum>101</prodnum> <airdate>2006-10-01</airdate> <link>http://www.tvrage.com/Dexter/episodes/408409</link> <title>Dexter</title> </episode> <episode> <epnum>2</epnum> <seasonnum>02</seasonnum> <prodnum>102</prodnum> <airdate>2006-10-08</airdate> <link>http://www.tvrage.com/Dexter/episodes/408410</link> <title>Crocodile</title> </episode> <episode> <epnum>3</epnum> <seasonnum>03</seasonnum> <prodnum>103</prodnum> <airdate>2006-10-15</airdate> <link>http://www.tvrage.com/Dexter/episodes/408411</link> <title>Popping Cherry</title> </episode>

More pretty: <a href="http://services.tvrage.com/feeds/episode_list.php?sid=7926" rel="nofollow">http://services.tvrage.com/feeds/episode_list.php?sid=7926</a>

And this is my python code trying to read from that:

xml = minidom.parse(urlopen("http://services.tvrage.com/feeds/episode_list.php?sid=7926")) for episode in xml.getElementsByTagName('episode'): for node in episode.attributes['title']: print node.data

I can't get the actual episode data out as I want to get all the data from each episode. I've tried different variants but I can't get it to work. Mostly I get a <DOM Element: asdasd> back. I only care about the data inside each episode.

Thanks for the help

Answer1:

Each episode element has child-elements, including a title element. Your code, however, is looking for <em>attributes</em> instead.

To get text out of a minidom element, you need a helper function:

def getText(nodelist): rc = [] for node in nodelist: if node.nodeType == node.TEXT_NODE: rc.append(node.data) return ''.join(rc)

And then you can more easily print all the titles:

for episode in xml.getElementsByTagName('episode'): for title in episode.getElementsByTagName('title'): print getText(title)

Answer2:

title is not an attribute, its a tag. An attribute is like src in <img src="foo.jpg" />

>>> parsed = parseString(s) >>> titles = [n.firstChild.data for n in parsed.getElementsByTagName('title')] >>> titles [u'Dexter', u'Crocodile', u'Popping Cherry']

You can extend the above to fetch other details. <a href="http://codespeak.net/lxml/" rel="nofollow">lxml</a> is better suited for this though. As you can see from the snippet above minidom is not that friendly.

Answer3:

Thanks to Martijn Pieters who tipped me with the ElementTree API I solved this problem.

xml = ET.parse(urlopen("http://services.tvrage.com/feeds/episode_list.php?sid=7296")) print 'xml fetched..' for episode in xml.iter('episode'): print episode.find('title').text

Thanks

Recommend

  • Picking commit objects in Git
  • Workflow for maintaining different versions of a webapp using git?
  • Filter text search ng-repeat from JSON doesn't work, angularjs
  • programatically use git rebase -i
  • Calendar library supporting holidays
  • Get a series of commits on a detached head onto a branch
  • Migrating a Cosmos DB fixed collection with partition key to an unlimited collection
  • Can we identify the date on which someone liked my page?
  • Use jQuery.getJson to get Web API [duplicate]
  • Combining multiple atom feeds into one
  • Heroku rake task uninitialized constant for MongoMapper model
  • Connecting to Oracle from Java …Exception
  • Python - Pyodbc Connection error
  • Why is my Jquery ajax success handler being called with an array (and not the response object)
  • Sorting Tabulated Data
  • Serializing (and deserializing) 'complex' Rails objects with JSON
  • What is supported by broadcasting in tensorflow? How dimensions matches determined?
  • Scripting PDF Creation
  • Why is this jQuery reference '$(“”)' instead of '$(“”)'?
  • GData Youtube : obtaining thumbnails
  • Strings appear exact, but they do not match?
  • C++ stl pop doesn't return [closed]
  • Passing parameter through “window.location.href”
  • import scipy.sparse failed
  • During installation of Django, why do I keep getting ImportError: No module named django?
  • where do I find the xml.dom python package for the python-2.6.0-8.9.28 and I have a suse/x86_64 vers
  • Python pickle not one-to-one: different pickles give same object
  • Alternative To body {overflow:scroll;} That Will Prevent Page Jostling/Wriggling?
  • Upload files with Ajax and Jquery
  • Do I've to free mysql result after storing it?
  • A cron job substitute?
  • InvalidAuthenticityToken between subdomains when logging in with Rails app
  • json Serialization in asp
  • SQL merge duplicate rows and join values that are different
  • Acquiring multiple attributes from .xml file in c#
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • LevelDB C iterator
  • How can I remove ASP.NET Designer.cs files?
  • Can't mass-assign protected attributes when import data from csv file
  • java string with new operator and a literal