python + pandas爬取网页表格数据

这里以工标网标准数据为例 http://www.csres.com/notice/50655.html

先请求页面，xpath定位表格区域

res = requests.get('http://www.csres.com/notice/50655.html')
res_elements = etree.HTML(res.text)
table = res_elements.xpath('//table[@id="table1"]')
table = etree.tostring(table[0], encoding='utf-8').decode()

调用pandas的read_html方法解析表格数据

df = pd.read_html(table, encoding='utf-8', header=0)[0]
results = list(df.T.to_dict().values())  # 转换成列表嵌套字典的格式

转存为csv文件

df.to_csv("std.csv", index=False)

最后结果如图

python + pandas爬取网页表格数据

代码依赖环境 requests pandas lxml

python + pandas爬取网页表格数据

相关推荐