使用Python刮擦表格数据 - BeautifulSoup

问题描述：

无法弄清楚如何刮取第一个表格数据而不是两者。使用Python刮擦表格数据 - BeautifulSoup

<tr> 
<td>WheelDust 
</td> 
<td>A large puff of barely visible brown dust 
</td></tr>

我只想WheelDust而是我得到WheelDust和隐约可见的褐色灰尘

import requests 
from bs4 import BeautifulSoup 


r = requests.get("https://wiki.garrysmod.com/page/Effects") 

soup = BeautifulSoup(r.content, "html.parser") 

for td in soup.findAll("table"): 
    #--print(td) 
    for a in td.findAll("tr"): 
     print(a.text)

如果您不希望在第一次查找后继续迭代，则可以使用soup.find_all的soup.find intsead。当您找到'WheelDust'时，您也可以使用'break'。 – Landmaster

是的，但这是一张桌子，所以我想在第一类中找到所有的东西 –

为什么你在进入tr之后不做a.find（'td'）？ – Landmaster

答

大粉扑我还是不知道你要问什么，但我相信你说你想访问和唯一的第一个，正确的？如果是这样的话，这是行不通的吗？我会尝试它，但它说我没有访问该网站。

import requests 
from bs4 import BeautifulSoup 


r = requests.get("https://wiki.garrysmod.com/page/Effects") 

soup = BeautifulSoup(r.content, "html.parser") 

for td in soup.findAll("table"): 
    #--print(td) 
    for a in td.findAll("tr"): 
     print(a.find('td'))

哦，那就是我要找的。我没有看到这样做。谢谢。当我添加文本属性时，它不会将文本返回给我，而是使用标签 –

Yup！这听起来很合理。一旦你解决了你的问题，随意勾选复选标记，以便将问题标记为已完成。 – Landmaster

答

试试这个。它会给你所有来自该表的数据。

import requests ; from bs4 import BeautifulSoup 

soup = BeautifulSoup(requests.get("https://wiki.garrysmod.com/page/Effects").text, "html.parser") 

table = soup.findAll('table', attrs={'class':'wikitable'})[0] # Changing the index number will give you whichever table you like 
list_of_rows = [[t_data.text for t_data in item.findAll('td')] 
       for item in table.findAll('tr')] 

for data in list_of_rows: 
    print(data)

使用Python刮擦表格数据 - BeautifulSoup

相关推荐