87742

Special national characters won't .split() in Python

Question:

I have trouble in Python, when reading special national characters from a text file.

with open("../Data/DKsnak.txt") as f: content = f.readlines() str1 = content[0] print "string:",str1 lst1 = str1.split() print "list:",lst1

The output is a follow:

string: Udtræk fra observatør på årstal list: ['Udtr\xc3\xa6k', 'fra', 'observat\xc3\xb8r', 'p\xc3\xa5', '\xc3\xa5rstal']

The first line is as expected, including special Danish charcters. But they don't survive being split into a string. I have tried various tricks with codecs and unicode, but can't find the magic bullit.

Please can anyone suggest how I get these words into lists, so I can work with them as such.

Best regards Martin

Running: Python 2.7.5 (default, Feb 19 2014, 13:47:28) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2

Answer1:

Your code is fine. python simply stores its special characters like that. If you print out your text, you will still get the original strings:

s = 'Udtræk fra observatør på årstal' s = s.split() for i in s: print i [OUTPUT] #all fine Udtræk fra observatør på årstal

Answer2:

from <a href="https://docs.python.org/2.7/howto/unicode.html" rel="nofollow">https://docs.python.org/2.7/howto/unicode.html</a>:

import codecs f = codecs.open('unicode.rst', encoding='utf-8')

so You get unicode and can split.

Answer3:

Using the for loop as mentioned before, if you want them on the same line:

for i in len(list1): string += list1[i] + ' ' print(string)

Recommend

  • How to implement language packs in PHP
  • how can i get countries states list from states table ,when i select country from countries table us
  • ftp import zip file with csv data, getting “string contains null byte”
  • how to correct the misencoded string?
  • Using multiple POSTGRES databases and schemas with the same Flask-SQLAlchemy model
  • getting duplicated value in xsl:for tag while xml transformation
  • How can I use Git's malloc wrapper in my code?
  • How do I properly work with unicode characters in python to keep from getting errors?
  • noob queries on unicode and str methods in Python
  • PHP: Convert single-quoted string into double-quoted
  • Can't find my syntax error, VC++ says there's one
  • Is it possible to get the word under the mouse cursor in a ``?
  • C++ Single function pointer for all template instances
  • Cannot save model when using ember render helper
  • Can I have a variable number of URI parameters or key-value pairs in Laravel 4?
  • Write output of for loop to multiple files
  • Getting different value with placeholder over CPU/GPU
  • there is no graph with tensorboard
  • Azure webjobs output logs indexing taking very long
  • Python ImageIO Gif Set Delay Between Frames
  • Group list of tuples by item
  • Scipy Leastsq Optional Output Variable (Mesg)
  • Trying to get the char code of ENTER key
  • Dynamically switching connect in Modelica
  • calculate gradient output for Theta update rule
  • Scala multiline string placeholder
  • Access variable of ScriptContext using Nashorn JavaScript Engine (Java 8)
  • Problem while Building a Setup Project for a windows Service?
  • How to attach a node.js readable stream to a Sendgrid email?
  • Functions in global context
  • Test if a set exists before trying to drop it
  • Unity3D & Android: Difference between “UnityMain” and “main” threads?
  • Django: Count of Group Elements
  • Why value captured by reference in lambda is broken? [duplicate]
  • output of program is not same as passed argument
  • Modifying destination and filename of gulp-svg-sprite
  • Deserializing XML into class C#
  • Function pointer “assignment from incompatible pointer type” only when using vararg ellipsis
  • python draw pie shapes with colour filled
  • How to Embed XSL into XML