用BeautifulSoup迭代HTML

问题描述：

我想用BeautifulSoup遍历HTML文件，并找到带有内容的标签“Preferred Name” 下面是我正在寻找的标签:(这是我想要搜索的文件的一部分）：用BeautifulSoup迭代HTML

<td nowrap class="label"> 
    Preferred Name 
    <span class="slot_labels"></span> 
    </td>

我试着用这个（文件搜索的是HTML文件的名称）：

soup = BeautifulSoup(doc) 
tags = soup.fetch('td') 
for tag in tags: 
    if tag.contents[0] == 'Preferred Name': 
     return tag

此代码不能正常工作，有人可以帮助...？

答

内容包括空格，那么试试这个：

soup = BeautifulSoup(doc) 
tags = soup.fetch('td') 
for tag in tags: 
    if tag.contents[0] and tag.contents[0].strip() == 'Preferred Name': 
     return tag

它的工作！但是我不得不把“if”放在“try .. except”里面，因为一些标签的内容[0]是NoneType ...谢谢！ – 2013-03-01 00:52:32

因此编辑...但没有尝试...除外。 – isedev 2013-03-01 00:54:05

用BeautifulSoup迭代HTML

相关推荐