Python：从网站提取数据值

问题描述：

从网站获取数据值以便我可以使用它们的最简单方法是什么？因此，我们必须这样做让网页的源数据的简单方法：Python：从网站提取数据值

usock = urllib2.urlopen("WEBSITE URL") 
data = usock.read() 
usock.close() 
print data

在我们抓住和打印服务的页面数据，有两条线，我们有兴趣

<input type="hidden" name="SECRETCODE" value="l53DLeOfj1" /> 
<input type="hidden" name="NotSoSecretCode" value="Nr4MNjyK" />

。

如果我知道我正在寻找哪些值的名称，那么获取值的最佳方法是什么，以便我可以将它们放入我自己的变量中并进一步与它们一起玩耍？

答

BeautifulSoup将是您最需要的最简单的解决方案。

html = ''' 
<input type="hidden" name="SECRETCODE" value="l53DLeOfj1" /> 
<input type="hidden" name="NotSoSecretCode" value="Nr4MNjyK" /> 
''' 
soup = BeautifulSoup(html) 
print soup.find("input", {"name":"SECRETCODE"}) 
print soup.find("input", {"name":"NotSoSecretCode"})

您可能需要使用正则表达式繁琐为了这个目的，以及如果你手上有很多次！

答

lxml是从xml/html文件中提取数据的最强大的工具。基于XPath

答

如果你可以使用pyparsing然后

from pyparsing import Literal, Suppress, removeQuotes, dblQuotedString 

def cleanQuotedString(name): 
    return dblQuotedString.setParseAction(removeQuotes).setResultsName(name) 

def extractTokens(inputStream): 
    head = Suppress(Literal('<input')) 
    tail = Suppress(Literal('/>')) 
    equalSign = Suppress(Literal('=')) 
    typekey = Suppress(Literal('type')) + equalSign + cleanQuotedString('type') 
    namekey = Suppress(Literal('name')) + equalSign + cleanQuotedString('name') 
    valueKey = Suppress(Literal('value')) + equalSign + cleanQuotedString('value') 

    grammar = head + typekey + namekey + valueKey + tail 

    return grammar.scanString(inputStream) 

usock = urllib2.urlopen("WEBSITE URL") 
tokens = extractTokens(usock.read()) 
usock.close() 
for item, _, _ in tokens: 
    print("Element with type =", item.type, ", name = ", item.name, ", value = ", item.value)

Python：从网站提取数据值

相关推荐