红宝石解析的HttpResponse与引入nokogiri
解析的HttpResponse与引入nokogiri红宝石解析的HttpResponse与引入nokogiri
嗨,我无法解析的HttpResponse与引入nokogiri对象。
我用这个功能来获取此网站:
当我做这个网站打印获取链接
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(URI.encode(uri_str.strip))
puts url
#get path
req = Net::HTTP::Get.new(url.path,headers)
#start TCP/IP
response = Net::HTTP.start(url.host,url.port) { |http|
http.request(req)
}
case response
when Net::HTTPSuccess
then #print final redirect to a file
puts "this is location" + uri_str
puts "this is the host #{url.host}"
puts "this is the path #{url.path}"
return response
# if you get a 302 response
when Net::HTTPRedirection
then
puts "this is redirect" + response['location']
return fetch(response['location'],aFile, limit - 1)
else
response.error!
end
end
html = fetch("http://www.somewebsite.com/hahaha/")
puts html
noko = Nokogiri::HTML(html)
乱码一大堆和 引入nokogiri抱怨说,“node_set必须是引入nokogiri :: XML :: NODESET
如果有人可以提供帮助这将是颇为赞赏
的第一件事。你fetch
方法返回一个Net::HTTPResponse
对象,而不仅仅是body。你应该将身体提供给Nokogiri。
response = fetch("http://www.somewebsite.com/hahaha/")
puts response.body
noko = Nokogiri::HTML(response.body)
我已更新您的脚本,以便它可以运行(波纹管)。有几件事是未定义的。
require 'nokogiri'
require 'net/http'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(URI.encode(uri_str.strip))
puts url
#get path
headers = {}
req = Net::HTTP::Get.new(url.path,headers)
#start TCP/IP
response = Net::HTTP.start(url.host,url.port) { |http|
http.request(req)
}
case response
when Net::HTTPSuccess
then #print final redirect to a file
puts "this is location" + uri_str
puts "this is the host #{url.host}"
puts "this is the path #{url.path}"
return response
# if you get a 302 response
when Net::HTTPRedirection
then
puts "this is redirect" + response['location']
return fetch(response['location'], limit-1)
else
response.error!
end
end
response = fetch("http://www.google.com/")
puts response
noko = Nokogiri::HTML(response.body)
puts noko
该脚本不给出错误并打印内容。由于您收到的内容,您可能会收到Nokogiri错误。我在Nokogiri遇到的一个常见问题是字符编码。没有确切的错误,不可能说出发生了什么。
我recommnend在下面的StackOverflow问题
ruby 1.9: invalid byte sequence in UTF-8看(特别this answer)
How to convert a Net::HTTP response to a certain encoding in Ruby 1.9.1?
谢谢,但nokogiri仍然给我这个错误 – 2012-07-05 14:18:16
非常感谢Mr.Simard,我会查找字符编码。 – 2012-07-05 16:02:14
如何查看更详细的调试消息? Nokogiri给我的唯一错误是这个node_set必须是Nokogiri :: XML :: Nodeset – 2012-07-05 17:54:59
你应该使用机械化代替这个炎热的烂摊子。它负责重定向并处理你的编码。 – pguardiario 2012-07-05 23:15:58