如何从JSON数据中提取URL?
我有一个文本文档,我想从中提取URL并将它们放在一个新的文本文件中。我怎样才能在Perl中做到这一点?该文本文件的样本是在这里:如何从JSON数据中提取URL?
{"type":"TabGroupsManager:GroupData","id":65,"name":"XML Schema Editor","image":"http://www.altova.com/favicon.ico","disableAutoRename":false,"titleList":"XML Schema Editor\u000aAltova XMLSpy Code Generation\u000aOnline Video Demos\u000aScheduled Data Exchange Case Study\u000aXML Editor\u000aAltova XMLSpy 2011\u000aXML Schema Management Tool\u000a","tabs":["{\"entries\":[{\"url\":\"http://www.altova.com/xmlspy/xml-schema-editor.html\",\"title\":\"XML Schema Editor\",\"ID\":1442422751,\"referrer\":\"http://www.altova.com/xmlspy/xml-editing.html\",\"scroll\":\"0,0\",\"formdata\":{\"#q\":\"\"}}],\"index\":1,\"attributes\":{\"image\":\"http://www.altova.com/favicon.ico\"},\"extData\":{\"TabGroupsManagerGroupId\":\"65\",\"TabGroupsManagerGroupName\":\"XML Schema Editor\"},\"_formDataSaved\":true}","{\"entries\":[{\"url\":\"http://www.altova.com/xmlspy/xml-code-generation.html\",\"title\":\"Altova XMLSpy Code Generation\",\"ID\":1442423118,\"referrer\":\"http://www.google.com/search?hl=en&client=firefox-a&hs=GR1&rls=org.mozilla%3Aen-GB%3Aofficial&q=altova+derive+schema+from+xml&aq=f&aqi=m1&aql=&oq=&gs_rfai=\",\"scroll\":\"0,0\",\"formdata\":{\"#q\":\"\"}}],\"index\":1,\"attributes\":{\"image\":\"http://www.altova.com/favicon.ico\"},\"extData\":{\"TabGroupsManagerGroupId\":\"65\",\"TabGroupsManagerGroupName\":\"XML Schema Editor\"},\"_formDataSaved\":true}","{\"entries\":[{\"url\":\"http://www.altova.com/videos.asp?type=0&video=xmlspy\",\"title\":\"Online Video Demos\",\"ID\":1442423184,\"referrer\":\"http://www.altova.com/xmlspy/xml-code-generation.html\",\"scroll\":\"0,0\",\"formdata\":{\"#q\":\"\"}}],\"index\":1,\"attributes\":{\"image\":\"http://www.altova.com/favicon.ico\"},\"extData\":{\"TabGroupsManagerGroupId\":\"65\",\"TabGroupsManagerGroupName\":\"XML Schema Editor\"},\"_formDataSaved\":true}","{\"entries\":[{\"url\":\"http://www.altova.com/solutions/exchange_ratecasestudy.html\",\"title\":\"Scheduled Data Exchange Case Study\",\"ID\":2618,\"formdata\":{\"#q\":\"\"},\"scroll\":\"0,1369\"}],\"index\":1,\"attributes\":{\"image\":\"http://www.altova.com/favicon.ico\"},\"extData\":{\"TabGroupsManagerGroupId\":\"65\",\"TabGroupsManagerGroupName\":\"XML Schema Editor\"},\"_formDataSaved\":true}","{\"entries\":[{\"url\":\"http://www.altova.com/xml-editor/\",\"title\":\"XML Editor\",\"ID\":2620,\"formdata\":{\"#q\":\"\"},\"scroll\":\"0,0\"}],\"index\":1,\"attributes\":{\"image\":\"http://www.altova.com/favicon.ico\"},\"extData\":{\"TabGroupsManagerGroupId\":\"65\",\"TabGroupsManagerGroupName\":\"XML Schema Editor\"},\"_formDataSaved\":true}","{\"entries\":[{\"url\":\"http://manual.altova.com/XMLSpy/spystandard/index.html?xmlschemasstd.htm\",\"title\":\"Altova XMLSpy 2011\",\"ID\":2622,\"children\":[{\"url\":\"http://manual.altova.com/XMLSpy/spystandard/xmlspy_content_dyn.html\",\"title\":\"Altova XMLSpy 2011\",\"ID\":2623,\"referrer\":\"http://manual.altova.com/XMLSpy/spystandard/index.html?xmlschemasstd.htm\",\"scroll\":\"0,0\"},{\"url\":\"http://manual.altova.com/XMLSpy/spystandard/xmlschemasstd.htm\",\"title\":\"XML Schemas\",\"ID\":2624,\"referrer\":\"http://manual.altova.com/XMLSpy/spystandard/index.html?xmlschemasstd.htm\",\"scroll\":\"0,260\"}],\"scroll\":\"0,0\"}],\"index\":1,\"attributes\":{},\"extData\":{\"TabGroupsManagerGroupId\":\"65\",\"TabGroupsManagerGroupName\":\"XML Schema Editor\"},\"_formDataSaved\":true}","{\"entries\":[{\"url\":\"http://www.altova.com/schemaagent.html\",\"title\":\"XML Schema Management Tool\",\"ID\":2626,\"formdata\":{\"#q\":\"\"},\"scroll\":\"0,171\"}],\"index\":1,\"attributes\":{\"image\":\"http://www.altova.com/favicon.ico\"},\"extData\":{\"TabGroupsManagerGroupId\":\"65\",\"TabGroupsManagerGroupName\":\"XML Schema Editor\"},\"_formDataSaved\":true}"]}
,从我想创建一个像一个文本文件:由于这似乎是一个JSON文件,而不是一个纯文本文件
http://www.altova.com/xmlspy/xml-schema-editor.html
http://www.altova.com/xmlspy/xml-code-generation.html
,使用其中一个JSON modules on CPAN。这有点复杂,因为你似乎将数据编码为JSON,然后将其作为字符串存储到较大的对象中,然后将其转换为JSON - 因此您必须解析文件,提取字符串,将它们解析为JSON转而从它们中提取URI。
如果Perl是不是必须的,
$ sed 's|\\||g' file| awk -vRS='"url":' -F"," '{print $1}' | grep -E "http|ftp"
"http://www.altova.com/xmlspy/xml-schema-editor.html"
"http://www.altova.com/xmlspy/xml-code-generation.html"
"http://www.altova.com/videos.asp?type=0&video=xmlspy"
"http://www.altova.com/solutions/exchange_ratecasestudy.html"
"http://www.altova.com/xml-editor/"
"http://manual.altova.com/XMLSpy/spystandard/index.html?xmlschemasstd.htm"
"http://manual.altova.com/XMLSpy/spystandard/xmlspy_content_dyn.html"
"http://manual.altova.com/XMLSpy/spystandard/xmlschemasstd.htm"
"http://www.altova.com/schemaagent.html"
在安装了MSYS的Windows计算机上尝试了此操作。出现错误:'\\'未被识别为内部或外部命令, 可操作程序或批处理文件。在windows上使用 – iceman 2010-10-12 18:36:28
,将所有单引号改为双引号并且在'awk'内部转义所有双引号,因为cmd.exe很糟糕 – ghostdog74 2010-10-12 23:29:14
是JSON,您可以使用现有的JSON的Perl模块在http://search.cpan.org – Prix 2010-10-10 09:21:50
任何通用网址提取应该使用模块[' URI :: Find'](http://p3rl.org/URI::Find)。 – daxim 2010-10-10 13:28:34