Python==爬取Baidu人脸图片
1 源码
1.1 无注释版
import requests, urllib, re
def getData():
i = 0
URLs = []
while(i < 60):
i += 30
print(i)
url = "https://image.baidu.com/search/acjson?tn=resultjson_com&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E4%BA%BA%E8%84%B8&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=0&ic=0&hd=0&latest=0©right=0&word=%E4%BA%BA%E8%84%B8&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&selected_tags=&pn={}".format(i)
response = requests.get(url)
response = response.text
picURL = re.findall('"thumbURL":"(.*?)",', response, re.S)
URLs += picURL
return URLs
def downLoadImage(URLs):
for i, URL in enumerate(URLs):
print("No." + str(i + 1) + '.jpg' + 'is downloading...')
urllib.request.urlretrieve(URL, "face"+str(i+1)+'.jpg')
if __name__ == "__main__":
downLoadImage(getData())
1.2 github:注释版
2 结果
测试结果
状态提示
3 解析
打开百度网页搜索:人脸->图片,右键选择查看元素(Firefox)或检查(chrome).
按图3.1选择,以Firefox为例.
4 总结
- 倒序分析,先从URL解析开始,解析图片的数据所在的真实URL;
- 批量获取图片,需要进一步处理URL,正则化获取,批量存储URL;
- 最重要是找到分页参数;
- 注意:XHR中无数据时,滑动网页,即有数据;
[参考文献]
[1]https://blog.****.net/xiligey1/article/details/73321152