14730

Unicode encoding for Polish characters in Python

Question:

I am having a Polish artist name as follows:

Żółte słonie

In my dataset (json file), it has been encoded as:

\u017b\u00f3\u0142te S\u0142onie

I am reading the json and doing some pre-processing and writing the output to a text file. I get the following error:

UnicodeEncodeError: 'charmap' codec can't encode character u'\u017b' in position 0: character maps to <undefined>

I looked up the Unicode encoding for Polish characters online and the encoding looks fine to me. Since I have never worked with anything other than LATIN before, I wanted to confirm this with the SO community. If the encoding is right, then why is Python not handling it?

Thanks, TM

Answer1:

I have made simple test with Python 2.7 and it seems that json changes type of object from str to unicode. So you have to encode() such string before writing it to text file.

#!/usr/bin/env python # -*- coding: utf8 -*- import json s = 'Żółte słonie' print(type(s)) print(repr(s)) sd = json.dumps(s) print(repr(sd)) s2 = json.loads(sd) print(type(s2)) print(repr(s2)) f = open('out.txt', 'w') try: f.write(s2) except UnicodeEncodeError: print('UnicodeEncodeError, encoding data...') f.write(s2.encode('UTF8')) print('data encoded and saved') f.close()

Recommend

  • when writing to csv file writerow fails with UnicodeEncodeError
  • Writing to excel string in encoding UTF-16
  • How do I implement tinymce.Shortcuts in TinyMCE v4
  • Output ascii file from Unicode Web Scrape in Python
  • Python CSV file UTF-16 to UTF-8 print error
  • How to match a emoticon in sentence with regular expressions
  • Which character encoding is the IPython terminal using?
  • problem opening a text document - unicode error
  • Ignoring certain characters while looping through CSV rows
  • SoundPlayer not playing any bundled windows sounds PCM wav files
  • Get or convert Week of year to ISO week
  • How to distribute Java-based software?
  • Retrieving a contacts notes
  • Special chars in Amazon S3 keys?
  • Encode Byte array to JPEG image in Objective-C
  • Using HTML/CSS for UI in XNA?
  • Serve file to user over http via php
  • C function strchr - How to calculate the position of the character?
  • Rodeo UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal
  • SIP API media codecs
  • Possible to “watch” both HAML and SASS at the same time?
  • UIAlertController button function not working
  • Trying to get the char code of ENTER key
  • Android Google Maps API v2 start navigation
  • Blackberry - Custom EditField Cursor
  • PHP buffered output depending on server setting?
  • preg_replace Double Spaces to tab (\\t) at the beginning of a line
  • Body moving without any force applied? (Box2d)
  • Jenkins: How To Build multiple projects from a TFS repository?
  • Why HTML5 Canvas with a larger size stretch a drawn line?
  • using HTMLImports.whenReady not working in chrome
  • Hits per day in Google Big Query
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • Android Google Maps API OnLocationChanged only called once
  • Linking SubReports Without LinkChild/LinkMaster
  • XCode 8, some methods disappeared ? ex: layoutAttributesClass() -> AnyClass
  • costura.fody for a dll that references another dll
  • Observable and ngFor in Angular 2
  • UserPrincipal.Current returns apppool on IIS
  • java string with new operator and a literal