I am using Python 2.6.1 and there is a problem related to UTF-8 with my code. This problem is reproducible with this code:
# - * - coding: UTF-8 - * - importing OS, system import string, time import codex, again BDATA = '' Domicile lymphobody '' 'Eddie Marson' ',' Isach de Banol ',' 'John Hawks'' print (BDATA) file obz = codex.open ("BTVPreપી 1.txt", "R", "UTF-8") data = File obz. First print of READ () print (data) bDATA works fine however, if the same data file is in the BTCPTTP file, Python complains in this way Does:
cat btvresp2.txt "Domestic Lombardis", "Eddie Marson", "Christian De Banol?" , "John Hawks" Python 2.6.1 (r261: 67515, February 11, 2010, 00:51:29) [GCC 4.2.1 (creation of Apple Inc. 5646)] Darwin type "help", "copyright", "more For information, "Credit" or "License" & gt; & gt; # - * - Coding: UTF-8 - * - ...> gt;> Import OS, sys> & gt; Import code, Time & gt; & gt; & gt; Import codec, RE & gt; & gt; & gt; BDATA = '' Domestic Lombardies' ',' Eddie Marson ',' Isa de Banol ',' John Hawks "" & gt; & gt; Print (BDATA) "Domestic Lombardis", "Eddie Marson", " Isaac D. Bankol "," John Hawks "and gt;> FileObj = codecs.open (" btvresp2.txt "," r "," utf-8 ") >> > data = File obz.rade () traceback (last most recent call): File "& lt; Stdin & gt; ", line 1, & lt; module & gt; file" / system / library / framework / python. Framework / version-2.6 / lib / python2.6 / codecs.py ", line 666, read reader itself. Reader. (Read) file" / system / library / framework / Python.framework/Versions/2.6/lib /python2.6/codecs.py ", in line 472, read NewCars, decodedbytes = self-decode (data, auto errors) Unicodecode Errors: 'UTF 8' can not decode codec byte in case of 55-57: Invalid data I'm not sure the data read from a file is problem Why do they arise
= post-text "itemprop ="
text ">
It seems The content of your file is not encoded in UTF-8. Are you sure you did not save it in any other encoding? When you do cat the file, then the terminal ? Displays in place of é , which will also indicate an encoding problem in the file, using your terminal UTF-8.
Also you have two files, btvresp1.txt and btvresp2.txt . Are you using the right one?
Comments
Post a Comment