蟒蛇怎么算的HTML

问题描述：

的开始和结束标记的数量如何计算在HTML蟒蛇怎么算的HTML

ya.html

<div class="side-article txt-article"> 
<p> 
    <strong> 
    </strong> 
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres"> 
    </a> 
    <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan"> 
    </a> 
</p> 
<p> 
    <br> 
</p> 
<p> 
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres"> 
    </a> 
</p> 
<p> 
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres"> 
    </a> 
    <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan"> 
    </a> 
</p> 
<br>

我的代码

from bs4 import BeautifulSoup 

soup = BeautifulSoup(open('ya.html'), "html.parser") 
num_apperances_of_tag = len(soup.find_all()) 

print num_apperances_of_tag

的开始和结束标记的数量

输出

但这不是我想要的，因为我的代码计数为<p> </p>，但我希望单独计算开始和结束标记。

如何计算HTML中的开始和结束标记的数量？所以输出将

感谢

答

我建议你使用HTML解析器来解决这个问题：

from HTMLParser import HTMLParser 

number_of_starttags = 0 
number_of_endtags = 0 

# create a subclass and override the handler methods 
class MyHTMLParser(HTMLParser): 
    def handle_starttag(self, tag, attrs): 
     global number_of_starttags 
     number_of_starttags += 1 

    def handle_endtag(self, tag): 
     global number_of_endtags 
     number_of_endtags += 1 

# instantiate the parser and fed it some HTML 
parser = MyHTMLParser() 
parser.feed('<html><head><title>Test</title></head><body><h1>Parse me!</h1></body></html>') 

print(number_of_starttags, number_of_endtags)

它并没有为我工作，我得到UnboundLocalError：局部变量“number_of_starttags”引用在分配之前。 –

对，因为班级。只需指出全局的变量，它会正常工作。 –

蟒蛇怎么算的HTML

相关推荐