Several months ago, I wrote a Java API that use JNI to wrap around a C API. The C API used char strings and I used GetStringUTFChars to create the C strings from the Java Strings.
I neglected to think through the problems that might arise with non-ASCII characters.
Since then the creator of the C API has created wide character equivalents to each of his C functions that require or return wchar_t strings. I would like to update my Java API to use these wide character functions and overcome the issue I have with non-ASCII characters.
Having studied the JNI documentation, I am a little confused by the relative merits of using the GetStringChars or GetStringRegion methods.
I am aware that the size of a wchar_t character varies between Windows and Linux and am not sure of the most efficient way to create the C strings (and convert them back to Java strings afterwards).
This is the code I have at the moment which I think creates a string with two bytes per character:
int len; jchar *Src; len = (*env)->GetStringLength(env, jSrc); printf("Length of jSrc is %d\n", len); Src = (jchar *)malloc((len + 1)*sizeof(jchar)); (*env)->GetStringRegion(env, jSrc, 0, len, Src); Src[len] = '\0';
However, this will need modifying when the size of a wchar_t differs from jchar.Answer1:
Isn't the C API creator willing to take step back and reimplement with <strong>UTF-8</strong>? :) Your work would essentialy disappear, needing only
jchar is typedefed to
unsigned short and is equivalent to JVM
char which is <strong>UTF-16</strong>. So on Windows where
wchar_t is 2 bytes <strong>UTF-16</strong> too, you can do away with the code you presented. Just copy the raw bytes around, allocate accordingly. Don't forget to free after you're finished with the C API call. Complement with
NewString for conversion back to jstring.
The only other
wchar_t size i am aware of is 4 bytes (most prominently Linux) being <strong>UTF-32</strong>. And here comes the problem: <em>UTF-32 is not just UTF-16 somehow padded to 4 bytes.</em> Allocating double the amount of memory is just a beginning. There is a substantial conversion to do, <a href="http://gears.googlecode.com/svn/trunk/third_party/convert_utf/ConvertUTF.c" rel="nofollow">like this one which seems to be sufficiently free</a>.
But if you are not after performance that much and are willing to give up the plain memory copying on Windows, i suggest going
jstring to UTF-8 (which is what JNI provides natively with documented functionality) and then UTF-8 to UTF-16 or UTF-32 depending on
sizeof(wchar_t). There won't be any assumptions about what byte order and UTF encoding each platform gives. You seem to care about it, i see that you are checking
sizeof(jchar) which is 2 for the most of the visible universe :)