初识nokogiri
Nokogiri是一个HTML、XML、SAX等解析器。它可以通过CSS3或者XPath来检索文档。具有如下特征:
- XML/HTML DOM parser which handles broken HTML
- XML/HTML SAX parser
- XML/HTML Push parser
- XPath 1.0 support for document searching
- CSS3 selector support for document searching
- XML/HTML builder
- XSLT transformer
安装命令:
gem install nokogiri
这里截取官网上的一段代码示例:
require 'nokogiri'
require 'open-uri'
# Fetch and parse HTML document
doc = Nokogiri::HTML(open('http://www.nokogiri.org/tutorials/installing_nokogiri.html'))
puts "### Search for nodes by css"
doc.css('nav ul.menu li a', 'article h2').each do |link|
puts link.content
end
puts "### Search for nodes by xpath"
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
puts link.content
end
puts "### Or mix and match."
doc.search('nav ul.menu li a', '//article//h2').each do |link|
puts link.content
end
运行结果如下: