XML解析错误补救

XML解析错误补救

问题描述:

解析XML文件的过程中,我面临的解析错误,如[Fatal Error] :293:24: Invalid byte 2 of 2-byte UTF-8 sequence.我的XML样品中含有像XC3,这是一个字符(我的意思是通过按下Delete键一次XC3字符删除一些字符的时间。(我想这个角色粘贴在这里,但这个编辑器显示其它字符)。XML解析错误补救

<?xml version="1.0" encoding="utf-8"?> 
<issue-info> 
<issue-meta> 
<date>January 24, 2013</date> 
<from>Chris Burton, John Wiley &amp; Sons, Ltd.</from> 
<journal>Greenhouse Gases: Science and Technology</journal> 
<typesetter>Anju Upadhaya</typesetter> 
<volume>3</volume> 
<issue>1</issue> 
<printer>Markono,</printer> 
<cover-date>February 2013</cover-date> 
<online-issn>2152-3878</online-issn> 
<print-issn>2152-3878</print-issn> 
<total-pages>FM &ndash; 4; TEXT &ndash; 95; EM &ndash; 1: TOTAL = 100</total-pages> 
<spl-instruction></spl-instruction> 
</issue-meta> 
<issue-item> 
<seq>1</seq> 
<ed-ref>OFC</ed-ref> 
<aid></aid> 
<author></author> 
<description>Update from GHG 2_1 cover</description> 
<start-page>1</start-page> 
<end-page>1</end-page> 
<artty></artty> 
<category>OFC (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>2</seq> 
<ed-ref>IFC</ed-ref> 
<aid></aid> 
<author>49379Ůpdf</author> 
<description>New GHG colour ADVERT</description> 
<start-page>2</start-page> 
<end-page>2</end-page> 
<artty></artty> 
<category>IFC (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>3</seq> 
<ed-ref>FM1</ed-ref> 
<aid></aid> 
<author></author> 
<description>Table of Contents</description> 
<start-page>1</start-page> 
<end-page>1</end-page> 
<artty></artty> 
<category>TOC (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>4</seq> 
<ed-ref>FM2</ed-ref> 
<aid></aid> 
<author></author> 
<description>Editorial Board</description> 
<start-page>2</start-page> 
<end-page>2</end-page> 
<artty></artty> 
<category>Editorial Board (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>5</seq> 
<ed-ref>FM3</ed-ref> 
<aid></aid> 
<author></author> 
<description>Aims and Scope</description> 
<start-page>3</start-page> 
<end-page>3</end-page> 
<artty></artty> 
<category>Aims and Scope (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>6</seq> 
<ed-ref>FM4</ed-ref> 
<aid></aid> 
<author></author> 
<description>Information Page</description> 
<start-page>4</start-page> 
<end-page>4</end-page> 
<artty></artty> 
<category>Information Page (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>7</seq> 
<ed-ref></ed-ref> 
<aid>GHG1333</aid> 
<author>PROD ED</author> 
<description></description> 
<start-page>1</start-page> 
<end-page>2</end-page> 
<artty>ED</artty> 
<category>Editorial (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>8</seq> 
<ed-ref></ed-ref> 
<aid>GHG1334</aid> 
<author>PROD ED</author> 
<description></description> 
<start-page>3</start-page> 
<end-page>4</end-page> 
<artty>XX</artty> 
<category>60 Second Interview (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>9</seq> 
<ed-ref></ed-ref> 
<aid>GHG1335</aid> 
<author></author> 
<description></description> 
<start-page>5</start-page> 
<end-page>7</end-page> 
<artty>XX</artty> 
<category>Feature (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>10</seq> 
<ed-ref>GHG-12-0029.R2</ed-ref> 
<aid>GHG1313</aid> 
<author>PAN, CLODIC, TOUBASSY</author> 
<description></description> 
<start-page>8</start-page> 
<end-page>20</end-page> 
<artty>XX</artty> 
<category>In the Field (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online>03 Jan 2013</pub-online> 
</issue-item> 
<issue-item> 
<seq>11</seq> 
<ed-ref>GHG-12-0023.R1</ed-ref> 
<aid>GHG1298</aid> 
<author>Peterson, O'Byrne, Endres, Peterson</author> 
<description></description> 
<start-page>21</start-page> 
<end-page>29</end-page> 
<artty>XX</artty> 
<category>Spotlight (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online>14 Sep 2012</pub-online> 
</issue-item> 
<issue-item> 
<seq>12</seq> 
<ed-ref>GHG-12-0033.R2</ed-ref> 
<aid>GHG1321</aid> 
<author>Begag, Krutka, Dong, Mihalcik, Rhine, Gould, Baldic, Nahass</author> 
<description></description> 
<start-page>30</start-page> 
<end-page>39</end-page> 
<artty>XX</artty> 
<category>Spotlight (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>13</seq> 
<ed-ref>GHG-12-0036.R1</ed-ref> 
<aid>GHG1331</aid> 
<author>Cunningham, Lauchnor, Eldring, Esposito, Mitchell, Gerlach, Phillips, Ebigbo, Spangler</author> 
<description></description> 
<start-page>40</start-page> 
<end-page>49</end-page> 
<artty>XX</artty> 
<category>Spotlight (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>14</seq> 
<ed-ref>GHG-12-0034.R1</ed-ref> 
<aid>GHG1328</aid> 
<author>Elliot, Buscheck, Celia</author> 
<description></description> 
<start-page>50</start-page> 
<end-page>65</end-page> 
<artty>XX</artty> 
<category>Modeling and Analysis (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>15</seq> 
<ed-ref>GHG-12-0021.R1</ed-ref> 
<aid>GHG1318</aid> 
<author>Mazzoldi, Picard, Sriram, Oldenburg</author> 
<description></description> 
<start-page>66</start-page> 
<end-page>83</end-page> 
<artty>XX</artty> 
<category>Modeling and Analysis (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online>03 Jan 2013</pub-online> 
</issue-item> 
<issue-item> 
<seq>16</seq> 
<ed-ref>GHG-12-0031.R1</ed-ref> 
<aid>GHG1308</aid> 
<author>Eccles, Pratson</author> 
<description></description> 
<start-page>84</start-page> 
<end-page>95</end-page> 
<artty>XX</artty> 
<category>Modeling and Analysis (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online>26 Oct 2012</pub-online> 
</issue-item> 
<issue-item> 
<seq>17</seq> 
<ed-ref>EM1</ed-ref> 
<aid></aid> 
<author>Join the SCI</author> 
<description>NEW COLOUR ADVERT</description> 
<start-page></start-page> 
<end-page></end-page> 
<artty></artty> 
<category>Society Ad (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>18</seq> 
<ed-ref>IBC</ed-ref> 
<aid></aid> 
<author>ONLINE OPEN</author> 
<description>COLOUR ADVERT</description> 
<start-page>1</start-page> 
<end-page>1</end-page> 
<artty></artty> 
<category>IBC (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
<issue-item> 
<seq>19</seq> 
<ed-ref>OBC</ed-ref> 
<aid></aid> 
<author>CCUS</author> 
<description>NEW COLOUR ADVERT</description> 
<start-page>2</start-page> 
<end-page>2</end-page> 
<artty></artty> 
<category>OBC (GHG)</category> 
<toc-category></toc-category> 
<reprint></reprint> 
<color>N</color> 
<color-charge>0</color-charge> 
<pub-online></pub-online> 
</issue-item> 
</issue-info> 
解析

我的Java代码是在这里下面。

 DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance(); 
     docBuilderFactory.setValidating(false); 
     docBuilderFactory.setCoalescing(false); 
     docBuilderFactory.setXIncludeAware(false); 
     docBuilderFactory.setNamespaceAware(false); 
     docBuilderFactory.setIgnoringComments(true); 
     docBuilderFactory.setExpandEntityReferences(false); 

     DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder(); 
     Document doc = docBuilder.parse(rtfXmlIS); 
     doc.getDocumentElement().normalize(); 

如何摆脱这样的错误([Fatal Error] :16:45: Invalid byte 1 of 1-byte UTF-8 sequence.,[Fatal Error] :14:24: The entity "ndash" was referenced, but not declared.)?

+0

编辑你的文章,并包括输入XML的第一个20-30行。 – 2013-04-10 03:36:02

+0

Hi @Jim,添加了示例。请看一看。 – sairajgemini 2013-04-10 05:07:17

这是两个不同的错误。

第一个是因为输入不是UTF-8。在将它传递给解析器之前,您需要正确解码输入。

第二个可能是因为输入是XHTML而不是XML。如果你想使用这个输入的XML解析器并解析像&ndash;这样的实体,你将需要提供一个DTD来定义它和输入中包含的任何其他实体。

+0

Hi @Jim,请您详细说明如何以UTF-8格式解析XML?或者如何跳过解析器读取该字符。请提供一个伪代码。 – sairajgemini 2013-04-10 03:38:10

+0

编辑您的帖子以包含输入XML的样本。 – 2013-04-10 04:11:34

+0

Hi @Jim,添加了示例。请看看 – sairajgemini 2013-04-10 05:19:25