68182

Large RAM usage when parsing XML using Libxml2

Question:

I'm downloading a XML file from an API with URLSessionDataTask.<br /> The XML looks like this:

<?xml version="1.0" encoding="UTF-8" ?> <ResultList id="12345678-0" platforms="A;B;C;D;E"> <Book id="1111111111" author="Author A" title="Title A" price="9.95" ... /> <Book id="1111111112" author="Author B" title="Title B" price="2.00" ... /> <Book id="1111111113" author="Author C" title="Title C" price="5.00" ... /> <ResultInfo bookcount="3" /> </ResultList>

Sometimes the XML may have thousands of books.<br /> I'm parsing the XML with the SAX parser from Libxml2. While parsing I create a object Book and set the values from the XML like so:

private func startElementSAX(_ ctx: UnsafeMutableRawPointer?, name: UnsafePointer<xmlChar>?, prefix: UnsafePointer<xmlChar>?, URI: UnsafePointer<xmlChar>?, nb_namespaces: CInt, namespaces: UnsafeMutablePointer<UnsafePointer<xmlChar>?>?, nb_attributes: CInt, nb_defaulted: CInt, attributes: UnsafeMutablePointer<UnsafePointer<xmlChar>?>?) { let elementName = String(cString: name!) switch elementName { case "Book": let book = buildBook(nb_attributes: nb_attributes, attributes: attributes) parser.delegate?.onBook(book: book) default: break } } func buildBook(nb_attributes: CInt, attributes: UnsafeMutablePointer<UnsafePointer<xmlChar>?>?) -> Book { let fields = 5 /* (localname/prefix/URI/value/end) */ let book = Book() for i in 0..<Int(nb_attributes) { if let localname = attributes?[i * fields + 0], //let prefix = attributes?[i * fields + 1], //let URI = attributes?[i * fields + 2], let value_start = attributes?[i * fields + 3]//, /*let value_end = attributes?[i * fields + 4]*/ { let localnameString = String(cString: localname) let string_start = String(cString: value_start) //let string_end = String(cString: value_end) if let end = string_start.characters.index(of: "\"") { let value = string_start.substring(to: end) book.setValue(value, forKey: localnameString) } else { book.setValue(string_start, forKey: localnameString) } } } return book }

In the UITableViewController the onBook(book: Book) delegate method appends the book object to an array and updates the UITableView. So far so good.

The problem now is, it takes too much RAM of the device and so my device becomes slow. With ~500 books in the XML it takes >500 MB of RAM. I don't know why. When I lookup the RAM in Instruments, I see all the allocated memory in the category _HeapBufferStorage<_StringBufferIVars, UInt16>

<a href="https://i.stack.imgur.com/Wz8cH.png" rel="nofollow"><img alt="Instruments" class="b-lazy" data-src="https://i.stack.imgur.com/Wz8cH.png" data-original="https://i.stack.imgur.com/Wz8cH.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a>

With multiple entries greater than 100 KB

<a href="https://i.stack.imgur.com/eQBKH.png" rel="nofollow"><img alt="HeapBufferStorage entries" class="b-lazy" data-src="https://i.stack.imgur.com/eQBKH.png" data-original="https://i.stack.imgur.com/eQBKH.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a> In the Event History is the method buildBook() listed

<a href="https://i.stack.imgur.com/DfZMs.png" rel="nofollow"><img alt="Event History" class="b-lazy" data-src="https://i.stack.imgur.com/DfZMs.png" data-original="https://i.stack.imgur.com/DfZMs.png" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a>

When I use the XMLParser from Foundation with the constructor XMLParser(contentsOf: URL) which first downloads the whole XML and then parses it, I have normal RAM usage. No matter how many books. But I want to show the books ASAP in the UITableView. I just want something like Android's XMLPullParser for iOS.

Answer1:

I'm using libxml2 (due to <a href="https://stackoverflow.com/questions/44680734/parsing-xml-with-entities-in-swift-with-xmlparser" rel="nofollow">this</a> issue) and have code like this:

xmlParseChunk(ctxt, data, Int32(read), 0)

Changing the call to this reduces the amount of memory consumed considerably:

autoreleasepool { xmlParseChunk(ctxt, data, Int32(read), 0) }

If you're using the push parser call like above this will likely fix your problem. If not then wrapping your delegate call in the autoreleasepool call may help.

The reason is because a lot of intermediate objects are being created and added to an autorelease pool and not being released. See <a href="https://stackoverflow.com/questions/25860942/is-it-necessary-to-use-autoreleasepool-in-a-swift-program" rel="nofollow">this</a> post for more details.

An alternative is to work to reduce the number of objects being added to the autorelease pool by changing your code in other ways. I found for example I was creating extra strings by trimming white space in places where I could avoid it.

Additionally, this is not related to your problem, but the start and the end of the attributes tell you the length of the string and you should be using that.

For example:

let valStart = UnsafeMutableRawPointer(mutating: attributes! .advanced(by: 3 + Int(i * 5)).pointee) let valEnd = UnsafeMutableRawPointer(mutating: attributes! .advanced(by: 4 + Int(i * 5)).pointee) let valData = Data(bytesNoCopy: valStart!, count: valEnd! - valStart!, deallocator: .none) let attrValue = String(data: valData, encoding: String.Encoding.utf8)

Recommend

  • Which table should be Parent table and which should be child table?
  • how to store data in database(sqlite)
  • Quickly or concisely determine the longest string per column in a row-based data collection
  • @Autowired for @ModelAttribute
  • Capturing STDOUT in RSpec
  • redirect user after update in class based view in django
  • ASP.NET MVC2 Error: No parameterless constructor defined for this object
  • How to get the index of element in the List in c#
  • How do i disable a text box within an iframe
  • Programmatically Update Linked Named Range of excel object in MS Word (2007)
  • How to use the resource module to measure the running time of a function?
  • How can i dump blob fields from mysql tables
  • How to assert that an input element is empty in Ruby on Rails tests
  • Returning the auto incrementing value after an insert using slick
  • Prevent page break in text block with iText, XMLWorker
  • Shouldn't else be indented in the below code
  • Magento get URL before current
  • Why must we declare a variable name when adding a method to a struct in Golang?
  • Inline R code in YAML for rmarkdown doesn't run
  • how to save the state in userdefaults of accessory checkmark-iphone
  • Django simple Captcha “No module named fields” error
  • MonoTouch: How to download pdf incrementally as indicated in the Apple slides “Building Newsstand Ap
  • WPF - CanExecute dosn't fire when raising Commands from a UserControl
  • java inputstream
  • Rails Find when some params will be blank
  • How can I send an e-mail from a vbs script
  • Sails.js/waterline: Executing waterline queries in toJSON function of a model?
  • Why HTML5 Canvas with a larger size stretch a drawn line?
  • Spray.io: When (not) to use non-blocking route handling?
  • Can Jackson SerializationFeature be overridden per field or class?
  • Modifying destination and filename of gulp-svg-sprite
  • Getting 'uninitialized constant' error when using delegate in belongs_to in model
  • GridView Sorting works once only
  • Numpy divide by zero. Why?
  • Suggestions to manage Login/Logout transitions
  • how does django model after text[] in postgresql [duplicate]
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • Sorting a 2D array using the second column C++
  • Converting MP3 duration time
  • java string with new operator and a literal