无法使用beautifulsoup

问题描述：

我想从这个获取下载CSV文件链接提取下载链接：https://patents.google.com/?assignee=intel 无法使用beautifulsoup

这是我的代码：

import requests 
from bs4 import BeautifulSoup 
page = requests.get("https://patents.google.com/?assignee=intel") 
soup = BeautifulSoup(page.content, 'html.parser') 
soup.find_all('a', class_='style-scope search-results') 
soup.find_all('a', class_='style-scope')

但最后两行正在返回空数组。我在这里错过了什么？

即使这是不返回任何东西：

soup.find(id="resultsLayout")

答

这是因为正在由JavaScript生成的元素。您可以使用硒来获取整个页面源代码。

下面是使用硒进行编辑的代码版本。

from bs4 import BeautifulSoup 
from selenium import webdriver 

browser = webdriver.Chrome() 
browser.get('https://patents.google.com/?assignee=intel') 
page = browser.page_source 
browser.quit() 
soup = BeautifulSoup(page, 'html.parser') 
soup.find_all('a', class_='style-scope search-results') 
soup.find_all('a', class_='style-scope')

让我知道你是否需要澄清。谢谢！

无法使用beautifulsoup

相关推荐