4372

How to convert a string to unicode/byte string in Python 3?

Question:

I know this works:

a = u"\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728" print(a) # 方法,删除存储在

But if I have a string from a JSON file which does not start with "u"(a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"), I know how to make it in Python 2 (print unicode(a, encoding='unicode_escape') # Prints 方法,删除存储在). But how to do it with Python 3?

Similarly, if it's a byte string loaded from a file, how to convert it?

print("好的".encode("utf-8")) # b'\xe5\xa5\xbd\xe7\x9a\x84' # how to convert this? b = '\xe5\xa5\xbd\xe7\x9a\x84' # 好的

Answer1:

If I understand correctly, the file contains the literal text \u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728 (so it's plain ASCII, but with backslashes and all that describe the Unicode ordinals the same way you would in a Python str literal). If so, there are two ways to handle this:

<ol><li>Read the file in binary mode, then call mystr = mybytes.decode('unicode-escape') to convert from the bytes to str interpreting the escapes</li> <li>Read the file in text mode, and use the codecs module for the "text -> text" conversion (bytes to bytes and text to text codecs are now supported only by the codecs module functions; bytes.decode is purely for bytes to text and str.encode is purely for text to bytes, because usually, in Py2, str.encode and unicode.decode was a mistake, and removing the dangerous methods makes it easier to understand what direction the conversions are supposed to go), e.g. decodedstr = codecs.decode(encodedstr, 'unicode-escape')</li> </ol>

Recommend

  • Pascal Delphi - Undeclared Identifier
  • C - OpenSSL encryption using CBC (Cipher Block Chaining) mode
  • Minimum requirements to deploy FireMonkey on Windows
  • Spring aop doesn't run when project starts
  • iOS TableView swipe UIContextualAction can not get a right image
  • In rails controllers, how to prevent double submit (when user double-clic submit button or hit enter
  • Sample Program From Terminal Could Not Run
  • Pandas Dataframe ordering and sorting of column values
  • android: bottomsheet doesn't disappear normally
  • Does using package generics require the package to be in Depends or Imports?
  • Obtain access token for both Microsoft Graph and individual service API endpoints (Outlook REST APIs
  • How to make a div appear under button
  • QObject::findChild() returns None without obvious reason
  • Retrieving a contacts notes
  • Special chars in Amazon S3 keys?
  • Recursion Control flow
  • Encode Byte array to JPEG image in Objective-C
  • Serve file to user over http via php
  • Getting NullPointer exception with File.listfiles()
  • In-place sed command not working
  • Do query loads all the data in memory
  • Defined variables not working in javascript files when I use getScript
  • Convert SQLite database to XML
  • Debugging VB6 Code From Visual Studio 2010
  • Convert Type Decimal to Hex (string) in .NET 3.5
  • WPF Visiblity Binding to Boolean Expression with multiple Variables
  • Conversion from string “a” to type 'Boolean' is not valid
  • Appending Character to Character Array In C
  • Android - Material Design - NavigationView - How to put vertical scroll?
  • copying resource to sdcard gives a damaged file in android
  • Jquery UI tool tip close icon
  • Ajax Loaded meta Tags
  • SSO with signing and signature validation doesn't work
  • How do you troubleshoot character encoding problems?
  • How to pass list parameters for each object using Spring MVC?
  • Arrays break string types in Julia
  • File upload with ng-file-upload throwing error
  • Free memory of cv::Mat loaded using FileStorage API
  • Java static initializers and reflection
  • costura.fody for a dll that references another dll