
Question:
Our XML feed gives us encoded UTF-8 characters inside ISO-8859-1 a file. This is being fed into the database. So the text is ISO-8859-1 encoded and contains following stuff:
金融市场
Is there a way to convert that into a normal Java string? Similar to:
String str = fromHtmlUtf8("金融市场");
Where resulting str will contain normal UTF8 chars. Chinese in this case, but can be quite mixed.
Thanks.
Answer1:You can use the StringEscapeUtils from Apache Commons: <a href="http://commons.apache.org/lang/api-2.6/org/apache/commons/lang/StringEscapeUtils.html" rel="nofollow">http://commons.apache.org/lang/api-2.6/org/apache/commons/lang/StringEscapeUtils.html</a>
next time search before: <a href="https://stackoverflow.com/questions/2825985/how-to-convert-from-html-to-utf-8-in-java/2826064#2826064" rel="nofollow">How to convert from HTML to UTF-8 in java</a>
Answer2:If you need small lib for this, you can use HTMLEntitles
<a href="http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=htmlentities" rel="nofollow">http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=htmlentities</a>