如何从XML中忽略HTML标记?
问题描述:
我正在使用Ruby 1.8.7并将XML内容作为API响应的字符串。我想解析此回复,以便我可以不使用HTML标记:如何从XML中忽略HTML标记?
<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<response>\n <data>\n <publisher_share_percent>0.0</publisher_share_percent>\n <detailed_description><b>this is the testing detailed</b> </detailed_description>\n <title>Only £5.00. food (Regular £50.00/90% discount)</title>\n </data>\n <request_id>ed96dd50-3127-012f-3e93-042b2b8686e6</request_id>\n <message>The resource has been created successfully.</message>\n <status>201</status>\n</response>\n
答
您可以使用CGI::unescapeHTML。
require 'cgi'
CGI::unescapeHTML("Usage: foo "bar" <baz>")
# => "Usage: foo \"bar\" <baz>"
答
如果处理XML,因为它是什么,XML和使用XML解析器解析它,这项工作变得更加容易:
require 'nokogiri'
xml = <<EOT
<?xml version="1.0" encoding="UTF-8"?>
<response>
<data>
<publisher_share_percent>0.0</publisher_share_percent>
<detailed_description><b>this is the testing detailed</b> </detailed_description>
<title>Only £5.00. food (Regular £50.00/90% discount)</title>
</data>
<request_id>ed96dd50-3127-012f-3e93-042b2b8686e6</request_id>
<message>The resource has been created successfully.</message>
<status>201</status>
</response>
EOT
doc = Nokogiri::XML(xml)
puts doc.at('detailed_description').text
puts doc.at('title').text
保存和运行文件输出:
ruby ~/Desktop/test2.rb
<b>this is the testing detailed</b>
Only £5.00. food (Regular £50.00/90% discount)