Python最简单的网页抓取

1. Install 2 python packages:

$ sudo pip install requests
$ sudo easy_install beautifulsoup4

2. Creat test.py

#coding=utf-8 

import requests 

from bs4 import BeautifulSoup 
 
#get url

def get_html(url): 
 response = requests.get(url) 
 response.encoding = 'utf-8' 
 return response.text 
 
#get title

def get_title(html): 
 soup = BeautifulSoup(html, 'html.parser')
 soup.select('p')[0].get_text()
 title_content = soup.select('title')[0].get_text()
 return title_content

#get text

def print_p(html):
 soup = BeautifulSoup(html, 'html.parser')
 for p in soup.select('p'):
 print p.get_text()
 
url = "http://www.cityu.edu.hk/" 
html = get_html(url) 
title_content = get_title(html)

print title_content
print_p(html)

3.Go to folder of test.py then execute

$ python test.py

4. Output

Python最简单的网页抓取

相关推荐