I am having a Polish artist name as follows:
In my dataset (json file), it has been encoded as:
I am reading the json and doing some pre-processing and writing the output to a text file. I get the following error:
UnicodeEncodeError: 'charmap' codec can't encode character u'\u017b' in position 0: character maps to <undefined>
I looked up the Unicode encoding for Polish characters online and the encoding looks fine to me. Since I have never worked with anything other than LATIN before, I wanted to confirm this with the SO community. If the encoding is right, then why is Python not handling it?
I have made simple test with Python 2.7 and it seems that
json changes type of object from
unicode. So you have to
encode() such string before writing it to text file.
#!/usr/bin/env python # -*- coding: utf8 -*- import json s = 'Żółte słonie' print(type(s)) print(repr(s)) sd = json.dumps(s) print(repr(sd)) s2 = json.loads(sd) print(type(s2)) print(repr(s2)) f = open('out.txt', 'w') try: f.write(s2) except UnicodeEncodeError: print('UnicodeEncodeError, encoding data...') f.write(s2.encode('UTF8')) print('data encoded and saved') f.close()