如何将此函数更改为与Python 2和Python 3兼容？我遇到字符串，unicode和其他问题

问题描述：

我有一个功能，旨在使文件名或URL的一些文本安全。我试图改变它，以便它能够在Python 2和Python 3中工作。在我的尝试中，我将自己与字节码混淆并欢迎一些指导。我遇到类似sequence item 1: expected a bytes-like object, str found的错误。如何将此函数更改为与Python 2和Python 3兼容？我遇到字符串，unicode和其他问题

def slugify(
    text  = None, 
    filename = True, 
    URL  = False, 
    return_str = True 
    ): 

    if sys.version_info >= (3, 0): 

     # insert magic here 

    else: 

     if type(text) is not unicode: 
      text = unicode(text, "utf-8") 
     if filename and not URL: 
      text = unicodedata.normalize("NFKD", text).encode("ascii", "ignore") 
      text = unicode(re.sub("[^\w\s-]", "", text).strip()) 
      text = unicode(re.sub("[\s]+", "_", text)) 
     elif URL: 
      text = unicodedata.normalize("NFKD", text).encode("ascii", "ignore") 
      text = unicode(re.sub("[^\w\s-]", "", text).strip().lower()) 
      text = unicode(re.sub("[-\s]+", "-", text)) 
     if return_str: 
      text = str(text) 

    return text

答

看来你的主要问题是搞清楚如何将文本转换为Unicode并返回字节，当你不确定原始类型是什么。事实上，如果你小心，你可以在没有任何条件检查的情况下做到这一点。

if isinstance(s, bytes): 
    s = s.decode('utf8')

应该足以在Python 2或3中将某些东西转换为unicode（假设通常为2.6+和3.2+）。这是因为字节在Python 2中作为字符串的别名而存在。显式的utf8参数仅在Python 2中是必需的，但在Python 3中提供它也没有什么坏处。然后转换回字节串，你只需做相反的事情。

if not isinstance(s, bytes): 
    s = s.encode('utf8')

当然，我会建议你认真思考为什么你不确定你的字符串首先有什么类型。最好保持区别独立，而不是写出接受的“弱”API。 Python 3只是鼓励你保持分离。

'not isinstance（s，bytes）'将始终为真。你的意思是在第一个代码片段中将's.decode（'utf-8'）'分配给's'以外的东西吗？ – jwodder

@jwodder的想法是，片段之间会有其他代码。我只是展示了如何将某些东西转换为Unicode或字节，如果你不知道现有的类型。 – Antimony

如何将此函数更改为与Python 2和Python 3兼容？我遇到字符串，unicode和其他问题

相关推荐