使用beautifulsoup来提取HTML内容的数据 - HTML解析

问题描述：

我的剧本中的内容使用beautifulsoup库如下：使用beautifulsoup来提取HTML内容的数据 - HTML解析

<meta content="Free" itemprop="price" />

和

<div class="content" itemprop="datePublished">November 4, 2013</div>

我想拉的话自由和2013年11月4日从这个输出。将使用正则表达式的帮助还是美丽的有任何这样的属性，将直接拉出来呢？这里是我使用下面的代码：

from BeautifulSoup import BeautifulSoup 
    import urllib 
    import re 

    pageFile = urllib.urlopen("https://play.google.com/store/apps/details?id=com.ea.game.fifa14_na") 
    pageHtml = pageFile.read() 
    pageFile.close() 

    soup = BeautifulSoup("".join(pageHtml)) 
    item = soup.find("meta", {"itemprop":"price"}) 

    print item 
    items = soup.find("div",{"itemprop":"datePublished"}) 

    print items

答

好了！只需通过以下方法访问值（对于上述情况）：

from BeautifulSoup import BeautifulSoup 
    import urllib 


    pageFile = urllib.urlopen("https://play.google.com/store/apps/details?id=com.ea.game.fifa14_na") 
    pageHtml = pageFile.read() 
    pageFile.close() 

    soup = BeautifulSoup("".join(pageHtml)) 
    item = soup.find("meta", {"itemprop":"price"}) # meta content="Free" itemprop="price" 
    print item['content'] 
    items = soup.find("div",{"itemprop":"datePublished"}) 
    print items.string

无需添加正则表达式。只要通过文件阅读help。

你为什么要做'汤= BeautifulSoup（“”。join（pageHtml））' – User

使用beautifulsoup来提取HTML内容的数据 - HTML解析

相关推荐