48788

How to efficiently parsing large xml files on iOS in order to store the content in SQLite ?

Question:

I'm using libxml2′s DOM parser in my iPhone app to parse a XML file with a hundred thousand lines.

I store the content retrieved into a SQLite database. However, this process takes several minutes to complete, <em>too much to be user friendly</em>. My feeling is the biggest part of the time gets lost writing the data into the database.

I'm looking now for any hints on how to make this process more efficient.

Answer1:

Try profiling your code with Instruments! Check for the parts of your own code that are taking the most time! See if you can take a different approach (or post the slow code for suggestions)! If possible, use <a href="http://developer.apple.com/library/ios/#documentation/General/Conceptual/ConcurrencyProgrammingGuide/OperationObjects/OperationObjects.html%23//apple_ref/doc/uid/TP40008091-CH101-SW1" rel="nofollow">NSOperation / NSOperationQueue</a> to provide progress feedback to the user!

!

Answer2:

You might also have a look at this question: <a href="https://stackoverflow.com/questions/1711631" rel="nofollow">"How do I improve the performance of SQLite?"</a>

Answer3:

Definitely agree with JNozzi. Profile your code first to see where the largest performance bottlenecks are. It's not clear whether it's the XML DOM tree parsing, the XML DOM tree traversal, or the SQLite insertions which are causing the biggest problem.

If DOM tree parsing (or even traversal) is the issue, you should seriously consider switching to one of libxml2's more efficient XML parsing tools: xmlTextReader (the XML pull api) or SAX (the XML push api). I recommend xmlTextReader:

<a href="http://xmlsoft.org/xmlreader.html" rel="nofollow">http://xmlsoft.org/xmlreader.html</a>

Recommend

  • Hilbert Transform (Analytical Signal) using Apple's Accelerate Framework?
  • Explode string except where surrounded by parentheses?
  • Firefox + Selenium in python: How to interactively get an element html?
  • Eclipse can't find MinGW. Why?
  • FParsec: how to combine parsers so that they will be matched in arbitrary order
  • JFreeChart heap space
  • Rails Route parameters in AngularJS
  • SEO friendly 301 redirect .htm to .aspx
  • Stretch a span across a td
  • How to change default stop edit behavior in jtable
  • Specify HTTPS for custom WCF Binding
  • Distributed JMS based logging .. falling flat?
  • Making Google Visualization - Annotation Chart to work in GWT
  • Magento get URL before current
  • Prevent Tomcat from caching request during starup
  • Optimizing the print function in Matlab
  • Azure table store snapshot/backup capability
  • How to make JSON.NET deserialize to Microsoft Date Time?
  • How to get current document uri in XSLT?
  • How do I signal completion of my dataflow?
  • How can I set a binding to a Combox in a UserControl?
  • SharedPreferences or SQLite Database?
  • How to use JavaScript to determine whether a file exists in a directory?
  • Alternative to overridePendingTransition() - Android
  • How to avoid particles glitching together in an elastic particle collision simulator?
  • Recording logins for password protected directories
  • Splitting given String into two variables - php
  • What is Eclipse's Declaration View used for?
  • Check if a string to interpolate provides expected placeholders
  • Does CUDA 5 support STL or THRUST inside the device code?
  • Jquery - Jquery Wysiwyg return html as a string
  • SVN: Merging two branches together
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • Authorize attributes not working in MVC 4
  • How can I remove ASP.NET Designer.cs files?
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • Append folder name and increment by 1 using batch script
  • Busy indicator not showing up in wpf window [duplicate]
  • Python/Django TangoWithDjango Models and Databases
  • Net Present Value in Excel for Grouped Recurring CF