Python3.x 文件写入出现错误 TypeError: write() argument must be str, not bytes
背景
用Pycharm编辑器Python3.x语言写一个百度贴吧爬虫程序
代码如下:
import urllib.request import urllib.parse def loadPage(url): headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER "} #构造一个请求对象 request = urllib.request.Request(url,headers=headers) response = urllib.request.urlopen(request) # print(response.read()) return response.read() def writePage(html,filename): """ 作用:将html内容写入到本地 html:服务器相应文件内容 """ #文件写入 with open(filename,"w") as f : f.write(html) print("_"*30) def tiebaSpider(url,beginPage,endPage): """ 作用:贴吧爬虫调度器,负责组合处理每个页面的url url:贴吧url的前部分 beginPage :起始页 endPage : 结束页 """ for page in range (beginPage,endPage+1): pn = (page - 1)*50 filename = "第" + str(page) + "页.html" html = url +"&pn" +str(pn) #print(html) html =loadPage(html) writePage(html,filename) #print(html.decode("utf-8")) if __name__ == "__main__" : kw = input("请输入需要爬取的贴吧名:") beginPage = int(input("请输入起始页:")) endPage = int(input("请输入结束页:")) url = "http://tieba.baidu.com/f?" key = urllib.parse.urlencode({"kw":kw}) fullurl = url +key tiebaSpider(fullurl,beginPage,endPage)
程序运行:
在网上查资料可知,pickle存储方式默认是二进制方式,将writePage方法中代码with open(filename,"w") as f :改成二进制方式打开便可,with open(filename,"wb") as f :
运行结果如下: