'charmap' codec can't encode character '\\xae' While Scraping a Webpage


I am web-scraping with Python using BeautifulSoap I am getting this error

'charmap' codec can't encode character '\xae' in position 69: character maps to <undefined>

when scraping a webpage

This is my Python

hotel = BeautifulSoup(state.) print (hotel.select("div.details.cf span.hotel-name a")) # Tried: print (hotel.select("div.details.cf span.hotel-name a")).encode('utf-8')


We usually encounter this problem here when we are trying to .encode() an already encoded byte string. So you might try to decode it first as in

html = urllib.urlopen(link).read() unicode_str = html.decode(<source encoding>) encoded_str = unicode_str.encode("utf8")

As an example:

html = '\xae' encoded_str = html.encode("utf8")

Fails with

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)


html = '\xae' decoded_str = html.decode("windows-1252") encoded_str = decoded_str.encode("utf8") print encoded_str ®

Succeeds without error. Do note that "windows-1252" is something I used as an <em>example</em>. I got this from chardet and it had 0.5 confidence that it is right! (well, as given with a 1-character-length string, what do you expect) You should change that to the encoding of the byte string returned from .urlopen().read() to what applies to the content you retrieved.


