37717

How to convert java strings to wide character strings using JNI

Question:

Several months ago, I wrote a Java API that use JNI to wrap around a C API. The C API used char strings and I used GetStringUTFChars to create the C strings from the Java Strings.

I neglected to think through the problems that might arise with non-ASCII characters.

Since then the creator of the C API has created wide character equivalents to each of his C functions that require or return wchar_t strings. I would like to update my Java API to use these wide character functions and overcome the issue I have with non-ASCII characters.

Having studied the JNI documentation, I am a little confused by the relative merits of using the GetStringChars or GetStringRegion methods.

I am aware that the size of a wchar_t character varies between Windows and Linux and am not sure of the most efficient way to create the C strings (and convert them back to Java strings afterwards).

This is the code I have at the moment which I think creates a string with two bytes per character:

int len; jchar *Src; len = (*env)->GetStringLength(env, jSrc); printf("Length of jSrc is %d\n", len); Src = (jchar *)malloc((len + 1)*sizeof(jchar)); (*env)->GetStringRegion(env, jSrc, 0, len, Src); Src[len] = '\0';

However, this will need modifying when the size of a wchar_t differs from jchar.

Answer1:

Isn't the C API creator willing to take step back and reimplement with <strong>UTF-8</strong>? :) Your work would essentialy disappear, needing only GetStringUTFChars/NewStringUTF.

jchar is typedefed to unsigned short and is equivalent to JVM char which is <strong>UTF-16</strong>. So on Windows where wchar_t is 2 bytes <strong>UTF-16</strong> too, you can do away with the code you presented. Just copy the raw bytes around, allocate accordingly. Don't forget to free after you're finished with the C API call. Complement with NewString for conversion back to jstring.

The only other wchar_t size i am aware of is 4 bytes (most prominently Linux) being <strong>UTF-32</strong>. And here comes the problem: <em>UTF-32 is not just UTF-16 somehow padded to 4 bytes.</em> Allocating double the amount of memory is just a beginning. There is a substantial conversion to do, <a href="http://gears.googlecode.com/svn/trunk/third_party/convert_utf/ConvertUTF.c" rel="nofollow">like this one which seems to be sufficiently free</a>.

But if you are not after performance that much and are willing to give up the plain memory copying on Windows, i suggest going jstring to UTF-8 (which is what JNI provides natively with documented functionality) and then UTF-8 to UTF-16 or UTF-32 depending on sizeof(wchar_t). There won't be any assumptions about what byte order and UTF encoding each platform gives. You seem to care about it, i see that you are checking sizeof(jchar) which is 2 for the most of the visible universe :)

Recommend

  • Asking for undo/redo events in html/javascript
  • Retrieve google contact based on contact Id
  • A limitation of Sqlite3's full text search doesn't allow ORs with MATCHes. Workaround?
  • Trying to get mvc resources to serve my static resources
  • what does the follow prolog codes do?
  • Making more efficient Matlab ismember for large matrices: any faster suggestion than logical indexin
  • PHP Laravel executing many ajax request at same time will give a 'encryption key' server e
  • MySQL: Difference between `… ADD INDEX(a); … ADD INDEX(b);` and `… ADD INDEX(a,b);`?
  • How to name a 'group' check box in Adobe Reader when wanting to fill form by FDF / XFDF
  • Is there a way to call library thread-local init/cleanup on thread creation/destruction?
  • Accessing Rows In A LINQ Result Without A Foreach Loop?
  • Portable JRE on Linux - possible?
  • cordova is not defined - cordova.js has already been loaded :: Ionic
  • OSX - always hide certain files
  • Convert SQLite database to XML
  • WPF Visiblity Binding to Boolean Expression with multiple Variables
  • Conversion from string “a” to type 'Boolean' is not valid
  • How does this usort cmp function actually work?
  • Tamper-proof configuration files in .NET?
  • HTTP/2 streams vs HTTP/1.1 connections
  • Admob requires api-13 or later can I not deploy on old API-8 phones?
  • Change multiple background-images with jQuery
  • Handling un-mapped Rest path
  • Paperclip, set path outside of rails root folder
  • Java Scanner input dilemma. Automatically inputs without allowing user to type
  • Android screen density dpi vs ppi
  • Sony Xperia Z Tablet not found by adb
  • DirectX11 ClearRenderTargetViewback with transparent buffer?
  • script to move all files from one location to another location
  • WinForms: two way TextBox problem
  • Javascript convert timezone issue
  • Change an a tag attribute in JavaScript based on screen width
  • Why is the timeout on a windows udp receive socket always 500ms longer than set by SO_RCVTIMEO?
  • Function pointer “assignment from incompatible pointer type” only when using vararg ellipsis
  • Arrays break string types in Julia
  • How do you join a server to an Active Directory (domain)?
  • coudnt use logback because of log4j
  • how does django model after text[] in postgresql [duplicate]
  • Memory offsets in inline assembly