I have a big xml file and I do not wish to parse it, I just want to store every single character between
<information>...</information>, which are tags inside the xml file.
How can I do this?Answer1:
If the problem is that the data you're trying to extract will fit in memory, but the entire XML file won't, then use a streaming parser such as <a href="http://www.extreme.indiana.edu/xgws/xsoap/xpp/" rel="nofollow">XPP</a>.Answer2:
You can't accurately find the characters in the
<information> element without parsing the file. You could do something that works 99% of the time, but it would break when someone does something you didn't expect, like putting whitespace in the start tag, or having a commented-out
<information> element, or putting part of the
<information> element in an external entity.
Bite the bullet. If it's XML, you need an XML parser to read it.Answer3:
You may want to explain why you don't want to parse it as that would help in suggesting other solutions.
That being said, if you can construct an XPath for that node, you can always get that information with XPath. See <a href="http://www.ibm.com/developerworks/library/x-javaxpathapi/index.html" rel="nofollow">this tutorial</a>.
Given the new information, this isn't the solution you want. If you want to treat the xml as a string, reading it into a StringBuilder (the faster, thread-unsafe version of StringBuffer) is your best bet. If you're having trouble using StringBuffer, please post the code you tried and the error messages. It's max size is
java.lang.Integer.MAX_VALUE which is 2147483647.
Considering that you do not want to use a parser and you are just interested in extracting all characters between two tags, I'd rather suggest you to extract the xml content as string, and use a simple regular expression match to extract the portion between the two tags.