UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position

When I try to extract some pattern from a tagged text in nltk, I have the error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 79: ordinal not in range(128). Firstly I had not this error, but I got it only after installing some packages.

this is the code:

# -*- coding: utf-8 -*- import codecs import sys import re import sys import nltk from nltk.corpus import * k = nltk.corpus.brown.tagged_words('myfile') for (w1,t1), (w2,t2) in nltk.bigrams(k): if t1 == 'NN' and t2 == 'AJ': print w1, w2

this is the entire output of the code.

Traceback (most recent call last): File "/home/fathi/egfe.py", line 12, in <module> for (w1,t1), (w2,t2) in nltk.bigrams(k): File "/usr/local/lib/python2.7/dist-packages/nltk/util.py", line 442, in bigrams for item in ngrams(sequence, 2, **kwargs): File "/usr/local/lib/python2.7/dist-packages/nltk/util.py", line 419, in ngrams history.append(next(sequence)) File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/util.py", line 291, in iterate_from tokens = self.read_block(self._stream) File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/tagged.py", line 241, in read_block for para_str in self._para_block_reader(stream): File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/util.py", line 564, in read_blankline_block line = stream.readline() File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 1095, in readline new_chars = self._read(readsize) File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 1322, in _read chars, bytes_decoded = self._incr_decode(bytes) File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 1352, in _incr_decode return self.decode(bytes, 'strict') UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 79: ordinal not in range(128)


The problem is that the ntlk version is not compatabile with the python version, so it requires an older version of the nltk toolkit.


