Project-3: 用python下载贴吧中的图片

'''
<img class="BDE_Image" src="xx.jpg" pic_ext="jpeg" width="510" height="510">
<img class="BDE_Image" src="xx.jpg" pic_ext="jpeg" changedsize="true" width="560" height="560">
https://imgsa.baidu.com/forum/w%3D580/sign=490304174234970a47731027a5cbd1c0/59310a55b319ebc45695baa18126cffc1e17161d.jpg

1、发送请求，获取响应的html
    url:https://tieba.baidu.com/p/3205263090
2、在1中html，需要提取类似如下的img标签里的src
    <img class="BDE_Image" src="xx.jpg" pic_ext="jpeg" width="510" height="510">
3、在2中提取的src,再作为url,发送请求，获取响应的图片字节

4、将字节存储到本地
'''

'''
import re

content = '<div id="post_content_54896721026" class="d_post_content j_d_post_content ">            <img class="BDE_Smiley" width="30" height="30" changedsize="false" src="https://gsp0.baidu.com/5aAHeD3nKhI2p27j8IqW0jdnxx1xbK/tb/editor/images/client/image_emoticon40.png">我也想给他一个拥抱啊可是他退的好远<img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=490304174234970a47731027a5cbd1c0/59310a55b319ebc45695baa18126cffc1e17161d.jpg" pic_ext="jpeg" width="510" height="510"></div>'
#正则表达式
reg = r'src="([.*\S]*\.jpg)" pic_ext="jpeg"'
#编译正则表达式
imgre = re.compile(reg)
#使用正则表达式findall方法，返回是一个列表
imglist = re.findall(imgre,content)
print(imglist)
'''

import urllib.request
import re

#获取网页源代码
def getHtml(url):
    response = urllib.request.urlopen(url)
    html = response.read()
    return html

#获取图片地址
def getImg(html):
    reg = r'src="([.*\S]*\.jpg)" pic_ext="jpeg"'
    imgre = re.compile(reg)
    imglist = re.findall(imgre,html)
    return imglist

#输入任意的帖子的url地址，获取html
html = getHtml('https://tieba.baidu.com/p/3205263090')
#编码为utf-8
html = html.decode('utf-8')
# print(html)
#获取图片
imgList = getImg(html)
# print(imgList)
imgName = 0
for imgPath in imgList:
    f = open('%s.jpg'%imgName,'wb')
    f.write(urllib.request.urlopen(imgPath).read())
    f.close()
    imgName+=1
    print('下载第%s张图片'%imgName)
print('该网站图片已经下载完毕')
Project-3: 用python下载贴吧中的图片

相关推荐