不能与LXML
问题描述:
出HTML中提取的值我有这一块的HTML(代码不同):不能与LXML
<span class="ng-binding"> <b>Total:</b> 68.71€ (459 items) </span>
出于此,我想提取68.71€ (459 items)
我试着用这段代码到目前为止,只是将xpath复制到上面显示的span class中,直接从Chrome浏览器中取出:
import urllib.request
from lxml import html
import os
ids = ["ftpstorage1-730",
"ftpstorage2-730",
"ftpstorage3-730"]
for id in ids:
url = 'http://steam.tools/itemvalue/#/'+id
with urllib.request.urlopen(url) as response:
site = response.read()
tree = html.fromstring(site)
data = tree.xpath('//*[@id="container"]/div[5]/span[1]/text()')
print(data)
理论上这应该工作,但它不会,所有我得到的data
是:
[" {{(items | filter:dupesFilter | filter:typeFilter | filter:filterText | sumByKey:'price':'count':
e}}\n\t\t\t\t({{items | filter:dupesFilter | filter:typeFilter | filter:filterText | sumByKey:'count
[" {{(items | filter:dupesFilter | filter:typeFilter | filter:filterText | sumByKey:'price':'count':
e}}\n\t\t\t\t({{items | filter:dupesFilter | filter:typeFilter | filter:filterText | sumByKey:'count
[" {{(items | filter:dupesFilter | filter:typeFilter | filter:filterText | sumByKey:'price':'count':
e}}\n\t\t\t\t({{items | filter:dupesFilter | filter:typeFilter | filter:filterText | sumByKey:'count
任何知道我做错了吗?
它是否与正在生成的数字有关而不是静态的?
如果是这样,我该如何提取数字?
答
您在控制台上看到的印有的未呈现HTML与AngularJS绑定占位符。您需要一个真正的浏览器才能执行JavaScript,并让Angular将真实值放入占位符中。
或者,如果您深入了解如何检索和计算总价格,则无需使用真实浏览器即可解决此问题。做一个GET请求http://item-value10.appspot.com/ParseInv
端点提供id
和app
参数,解析JSON响应,并计算考虑到该帐户的价格的项目数:
import requests
template_url = "http://item-value10.appspot.com/ParseInv"
ids = ["ftpstorage1-730", "ftpstorage2-730", "ftpstorage3-730"]
for id in ids:
with requests.Session() as session:
session.get('http://steam.tools/itemvalue/#/' + id)
storage, app = id.split("-")
url = template_url.format(storage=storage, app=app)
response = session.get(url, params={
"id": storage,
"app": app
}, headers={
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36",
"Referer": "http://steam.tools/itemvalue/"
})
data = response.json()
total = sum(float(item["price"]) * int(item["count"]) for item in data["items"])
print(total)
打印:
20.439999999999998
78.16
0
数据不在源代码中,如果您通过'requests'获取总数,则会动态生成 –
@PadraicCunningham奖励积分。 :) – alecxe
@alecxe,你打败我吧! –