Python:Page Navigator Maximum Value Scrapper - 仅获取最后一个值的输出
问题描述:
这是我创建的程序,用于从列表中的每个类别部分中提取最大页面值。我无法获取所有值,I我只是得到列表中最后一个值的值。为了获得所有输出,我需要做些什么改变。Python:Page Navigator Maximum Value Scrapper - 仅获取最后一个值的输出
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
#List for extended links to the base url
links = ['Link_1/','Link_2/','Link_3/']
#Function to find out the biggest number present in the page navigation
#section.Every element before 'Next→' is consist of the upper limit
def page_no():
bs = soup(page_html, "html.parser")
max_page = bs.find('a',{'class':'next page-numbers'}).findPrevious().text
print(max_page)
#url loop
for url in links:
my_urls ='http://example.com/category/{}/'.format(url)
# opening up connection,grabbing the page
uClient = uReq(my_urls)
page_html = uClient.read()
uClient.close()
page_no()
页面导航实例: 1 2 3 … 15 Next →
由于提前
答
你需要把page_html在函数内部和缩进的最后4行。此外,最好返回max_page值,以便您可以使用它的功能。
def page_no(page_html):
bs = soup(page_html, "html.parser")
max_page = bs.find('a',{'class':'next page-numbers'}).findPrevious().text
return max_page
#url loop
for url in links:
my_urls='http://example.com/category/{}/'.format(url)
# opening up connection,grabbing the page
uClient = uReq(my_urls)
page_html = uClient.read()
uClient.close()
max_page = page_no(page_html)
print(max_page)
请给真正的网址你解析 –