42267

How to extract a big list of characters from xml file in Java

Question:

I have a big xml file and I do not wish to parse it, I just want to store every single character between <information>...</information>, which are tags inside the xml file.

How can I do this?

Answer1:

If the problem is that the data you're trying to extract will fit in memory, but the entire XML file won't, then use a streaming parser such as <a href="http://www.extreme.indiana.edu/xgws/xsoap/xpp/" rel="nofollow">XPP</a>.

Answer2:

You can't accurately find the characters in the <information> element without parsing the file. You could do something that works 99% of the time, but it would break when someone does something you didn't expect, like putting whitespace in the start tag, or having a commented-out <information> element, or putting part of the <information> element in an external entity.

Bite the bullet. If it's XML, you need an XML parser to read it.

Answer3:

You may want to explain why you don't want to parse it as that would help in suggesting other solutions.

That being said, if you can construct an XPath for that node, you can always get that information with XPath. See <a href="http://www.ibm.com/developerworks/library/x-javaxpathapi/index.html" rel="nofollow">this tutorial</a>.

<strong>UPDATE</strong>

Given the new information, this isn't the solution you want. If you want to treat the xml as a string, reading it into a StringBuilder (the faster, thread-unsafe version of StringBuffer) is your best bet. If you're having trouble using StringBuffer, please post the code you tried and the error messages. It's max size is java.lang.Integer.MAX_VALUE which is 2147483647.

Answer4:

Considering that you do not want to use a parser and you are just interested in extracting all characters between two tags, I'd rather suggest you to extract the xml content as string, and use a simple regular expression match to extract the portion between the two tags.

Recommend

  • Difficulty Reading with Atom Reader
  • How to do a simple XML Parser with Android
  • Android : Parse using xmlpullparser
  • Using FQL or Graph API to retrieve comments from legacy fb:comments widgets
  • Why are the results of integer division and converting to an int after division different for large
  • IntentsUI extension not reached for INSearchForNotebookItemsIntent
  • Table Valued Parameters with Estimated Number of Rows 1
  • Rails db:seed error “undefined method `finder_needs_type_condition?' for nil:NilClass”
  • Deserialize Dictionary
  • Extracting individual digits from a float
  • How to determine the CCSID used in CPYFRMIMPF command?
  • How to solve “undefined reference to function” error?
  • Certain Arabic text gets incorrectly shown while other Arabic text gets showed normally?
  • custom string delimiters stringtemplate-4
  • How to populate html table with info from list in django
  • Prevent page break in text block with iText, XMLWorker
  • Shouldn't else be indented in the below code
  • Validate jQuery plugin, field not required
  • Regex to match a string not followed by anything
  • How to make JSON.NET deserialize to Microsoft Date Time?
  • Eloquent paginate function in Slim 3 project using twig
  • Parse a date string in a specific locale (not timezone!)
  • WPF - CanExecute dosn't fire when raising Commands from a UserControl
  • Master page gives error
  • How to recover from a Spring Social ExpiredAuthorizationException
  • Does CUDA 5 support STL or THRUST inside the device code?
  • ILMerge & Keep Assembly Name
  • When should I choose bucket sort over other sorting algorithms?
  • Weird JavaScript statement, what does it mean?
  • Large data - storage and query
  • WOWZA + RTMP + HTML5 Playback?
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • Can Visual Studio XAML designer handle font family names with spaces as a resource?
  • How can I remove ASP.NET Designer.cs files?
  • Bitwise OR returns boolean when one of operands is nil
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • MATLAB: Piecewise function in curve fitting toolbox using fittype
  • Is there any way to bind data to data.frame by some index?
  • Django query for large number of relationships
  • How can i traverse a binary tree from right to left in java?