在python3
问题描述:
从文件中读取字节的字符串的文件的内容是像下面,文件编码为UTF-8:在python3
cd232704-a46f-3d9d-97f6-67edb897d65f b'this Friday, Gerda Scheuers will be excited \xe2\x80\x94 but she\xe2\x80\x99s most excited about the merchandise the movie will bring.'
这里是我的代码:
with open(file, 'r') as f_in:
for line in f_in:
tokens = line.split('\t')
print(tokens[1])
我想得到正确的答案 - “这个星期五,Gerda Scheuers会很兴奋 - 但她对这部电影带来的商品感到兴奋。”
print(b'\xe2\x80\x94'.decode('utf-8')) #convert into ASCII
但我不能从文件中读取的字节数。如果我打开一个带有字节的文件,我需要解码该行来分割它。
答
您可以使用ast.literal_eval
字面字节转换为字节:
然后,将其解码得到字符串对象:
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'")
b'excited \xe2\x80\x94 but she\xe2\x80\x99s'
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'").decode('utf-8')
'excited — but she’s'
with open(file, 'r') as f_in:
for line in f_in:
tokens = line.split('\t')
# if len(tokens) < 2:
# continue
bytes_part = ast.literal_eval(tokens[1])
s = bytes_part.decode('utf-8') # Decode the bytes to convert to a string