美丽汤与其他符号的URL替换某些符号

问题描述：

我解析某个网页与美丽的汤，尝试检索是H3标签内的所有链接：美丽汤与其他符号的URL替换某些符号

page = = requests.get(https://www....) 
soup = BeautifulSoup(page.text, "html.parser") 
links = [] 
for item in soup.find_all('h3'): 
links.append(item.a['href']

然而，找到的链接比不同该页面中存在的链接。例如，当页面中存在链接http://www.estense.com/?p=116872时，Beautiful soup会返回http://www.estense.com/%3Fp%3D116872，替换'？' '％3F'和'='与％3D。这是为什么？

谢谢。

它的URL逃逸解除引用的URL。但我无法重现这个问题。你使用的是什么版本的Python？ –

我使用Python 3.5.3。 – user1767774

答

可以使用urllib.parse

from urllib import parse 
parse.unquote(item.a['href'])

谢谢，但能否请您解释这个问题的起源？ – user1767774

原因可能是来自' chad

美丽汤与其他符号的URL替换某些符号

相关推荐