84269

Checking if a string is UTF-8 compatible for mySQL

Question:

We have older mySQL DB that only support UTF-8 charset. Is a there a way in Java to detect if a given string will be UTF-8 compatible?

Answer1:

public static boolean isUTF8MB4(String s) { for (int i = 0; i < s.length(); ++i) { int bytes = s.substring(i, i + 1).getBytes(StandardCharsets.UTF_8); if (bytes > 3) { return true; } } return false; }

The above implementation seems best, but otherwise:

public static boolean isUTF8MB4(String s) { for (int i = 0; i < s.length(); ) { int codePoint = s.codePointAt(i); int bytes = Character.charCount(codePoint); if (bytes > 3) { return true; } i += bytes; } return false; }

which might fail more often.

Answer2:

Every String is UTF-8 compatible. Just set encoding in the database and the MySQL driver correctly and you're set.

The only gotcha is that the length in bytes of the UTF-8 encoded string may be larger that what .length() says. <a href="https://stackoverflow.com/a/8512877/1648987" rel="nofollow">Here's a Java implementation of a function to measure how many bytes a string will take after encoding to UTF-8.</a>

EDIT: Since Saqib pointed out that older MySQL doesn't actually support UTF-8, but only its BMP subset, you can check if a string contains codepoints outside BMP with string.length()==string.codePointCount(0,string.length()) ("true" means "all codepoints are in BMP") and remove them with string.replaceAll("[^\u0000-\uffff]", "")

Answer3:

MySQL <a href="https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html" rel="nofollow">defines</a>:

<blockquote>

The character set named utf8 uses a maximum of three bytes per character and contains only BMP characters.

</blockquote>

Therefore this function should work:

private boolean isValidUTF8(final String string) { for (int i = 0; i < string.length(); i++) { final char c = string.charAt(i); if (!Character.isBmpCodePoint(c)) { return false; } } return true; }

Recommend

  • NiFi execute script encrypt json
  • Can not find a deserializer for non-concrete Map type [map type; class javax.ws.rs.core.MultivaluedM
  • One Note Api rejects Bearer Token, Error 401
  • Splitting text into paragraphs with regex JAVA
  • How to make twilio work with a proxy in java
  • Convert array of strings to array of objects
  • Does Perl currently (5.8 and 5.10) make any promises about the order alternations will be used?
  • pyspark substring and aggregation
  • pandas parse csv with left and right quote chars
  • Randomly placing a polygon inside of polygon
  • Dependency Injection and Code Obfuscation
  • How to fail Phing without triggering backtrace
  • Unity registration fails after iisreset
  • JPA/Hibernate - Entity name seems to be important. If I rename to “Bob” works fine
  • phpmailer - How to verify a sent email arrived at its destination
  • How to override jQuery promise callback
  • Regex for nested values
  • Replace last two characters in column
  • VBA Excel, loop through variables
  • converter json to two dimensional array
  • How can I include If-None-Match header in HttpRequestMessage
  • Can I read an iPhone beacon with Windows.Devices.Bluetooth.Advertisement.BluetoothLEManufacturerData
  • copying resource to sdcard gives a damaged file in android
  • Spark fat jar to run multiple versions on YARN
  • Encrypt data by using a public key in c# and decrypt data by using a private key in php
  • Bitwise OR returns boolean when one of operands is nil
  • Is it possible to post an object from jquery to bottle.py?
  • sending mail using smtp is too slow
  • Is there any way to bind data to data.frame by some index?
  • Django query for large number of relationships
  • Busy indicator not showing up in wpf window [duplicate]
  • costura.fody for a dll that references another dll
  • Why is Django giving me: 'first_name' is an invalid keyword argument for this function?
  • Binding checkboxes to object values in AngularJs
  • Observable and ngFor in Angular 2
  • How to Embed XSL into XML
  • How can I use `wmic` in a Windows PE script?
  • UserPrincipal.Current returns apppool on IIS
  • Conditional In-Line CSS for IE and Others?
  • How to push additional view controllers onto NavigationController but keep the TabBar?