错误:XML内容似乎不是XML:嵌套在csv文件中的XML数据的'NA'

问题描述:

我有一些csv格式的数据框,其中一些单元格中嵌套了XML数据(列表)。我想将XML列表转换为数据框,然后从每个新数据框中选择两列,然后将这些新数据框绑定在一起。我已经可以在包含数据的单元格上使用lapply来完成大部分工作。问题是,当我包含包含<NA>的单元格时,R返回一个错误。错误:XML内容似乎不是XML:嵌套在csv文件中的XML数据的'NA'

一些概述:

  • 数据帧被称为“匹配”
  • 我正在与被称为“目标”包含XML列表
  • 细胞工作列的[1729: 1733]
  • 包含< NA细胞>是[1727:1728]
  • 备注< NA>:

该代码实际上在括号和'NA'之间没有任何空格,但是,它确实有括号,就像XML一样。我在这里用空格写了它,因为当我没有空格写它时,它就像一个实际的“不适用”。看这里--->)。是的,我是一个noob。

下面是我使用的库:

library(XML)<br> 
library(plyr)<br> 
library(dplyr)<br> 

下面是我使用的成功代码:

var1 <- lapply(match$goal[1729:1733], as.character) 
var2 <- lapply(var1, xmlToList) 
var3 <- lapply(var2, ldply, data.frame) 
var4 <- lapply(var3, subset, select = c("player1", "id")) 

这里是如果我尝试包括与<细胞会发生什么NA>值:

gar1 <- lapply(match$goal[1727:1733], as.character) 
gar2 <- lapply(gar1, xmlToList) 

Error: XML content does not seem to be XML: 'NA'

  1. 如何运行此代码以便跳过< NA>的?

这是数据。很抱歉它是如此凌乱:

match$goal[1727:1733] 

和输出(数据样本):
(遗憾的是它出来作为一个不方便班轮我张贴了,但我会。不断插科打诨,看看我能想出更好的方式来显示它。

[1] <NA> [2] <NA> [3] <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>22</elapsed><player2>38807</player2><subtype>header</subtype><player1>37799</player1><sortorder>5</sortorder><team>10261</team><id>378998</id><n>295</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>24</elapsed><player2>24154</player2><subtype>shot</subtype><player1>24148</player1><sortorder>4</sortorder><team>10260</team><id>379019</id><n>298</n><type>goal</type><goal_type>n</goal_type></value></goal> [4] <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>39297</player2><subtype>shot</subtype><player1>26181</player1><sortorder>2</sortorder><team>9825</team><id>375546</id><n>231</n><type>goal</type><goal_type>n</goal_type></value></goal> [5] <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>407</event_incident_typefk><elapsed>83</elapsed><player2>30889</player2><subtype>distance</subtype><player1>30853</player1><sortorder>0</sortorder><team>8650</team><id>378041</id><n>344</n><type>goal</type><goal_type>n</goal_type></value></goal> [6] <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>36394</player2><subtype>shot</subtype><player1>23139</player1><sortorder>2</sortorder><team>8654</team><id>376060</id><n>244</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>10</elapsed><player2>37277</player2><subtype>shot</subtype><player1>23139</player1><sortorder>1</sortorder><team>8654</team><id>376165</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>411</event_incident_typefk><elapsed>47</elapsed><player2>34466</player2><subtype>volley</subtype><player1>127857</player1><sortorder>3</sortorder><team>8528</team><id>376929</id><n>294</n><type>goal</type><goal_type>n</goal_ty... <truncated> [7] <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>47</elapsed><player2>23354</player2><subtype>header</subtype><player1>26165</player1><sortorder>2</sortorder><team>10252</team><id>378837</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>p</comment><stats><penalties>1</penalties></stats><event_incident_typefk>20</event_incident_typefk><elapsed>64</elapsed><player1>40198</player1><sortorder>0</sortorder><team>8456</team><id>378981</id><n>266</n><type>goal</type><goal_type>p</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>69</elapsed><player2>24658</player2><subtype>shot</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379030</id><n>270</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</g... <truncated> 13225 Levels: <goal /> . 

========================= ===============================================

..

数据,尝试2(这是从Excel csv粘贴的。由于这个原因,NA的没有任何括号):

rowID seasonyears match_date_ti matchid goal 
1727 2015/2016 9/27/2015 0:00 1979894 NA 
1728 2015/2016 9/26/2015 0:00 1979895 NA 
1729 2008/2009 8/17/2008 0:00 489042 <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>22</elapsed><player2>38807</player2><subtype>header</subtype><player1>37799</player1><sortorder>5</sortorder><team>10261</team><id>378998</id><n>295</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>24</elapsed><player2>24154</player2><subtype>shot</subtype><player1>24148</player1><sortorder>4</sortorder><team>10260</team><id>379019</id><n>298</n><type>goal</type><goal_type>n</goal_type></value></goal> 
1730 2008/2009 8/16/2008 0:00 489043 <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>39297</player2><subtype>shot</subtype><player1>26181</player1><sortorder>2</sortorder><team>9825</team><id>375546</id><n>231</n><type>goal</type><goal_type>n</goal_type></value></goal> 
1731 2008/2009 8/16/2008 0:00 489044 <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>407</event_incident_typefk><elapsed>83</elapsed><player2>30889</player2><subtype>distance</subtype><player1>30853</player1><sortorder>0</sortorder><team>8650</team><id>378041</id><n>344</n><type>goal</type><goal_type>n</goal_type></value></goal> 
1732 2008/2009 8/16/2008 0:00 489045 <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>36394</player2><subtype>shot</subtype><player1>23139</player1><sortorder>2</sortorder><team>8654</team><id>376060</id><n>244</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>10</elapsed><player2>37277</player2><subtype>shot</subtype><player1>23139</player1><sortorder>1</sortorder><team>8654</team><id>376165</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>411</event_incident_typefk><elapsed>47</elapsed><player2>34466</player2><subtype>volley</subtype><player1>127857</player1><sortorder>3</sortorder><team>8528</team><id>376929</id><n>294</n><type>goal</type><goal_type>n</goal_type></value></goal> 
1733 2008/2009 8/17/2008 0:00 489046 <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>47</elapsed><player2>23354</player2><subtype>header</subtype><player1>26165</player1><sortorder>2</sortorder><team>10252</team><id>378837</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>p</comment><stats><penalties>1</penalties></stats><event_incident_typefk>20</event_incident_typefk><elapsed>64</elapsed><player1>40198</player1><sortorder>0</sortorder><team>8456</team><id>378981</id><n>266</n><type>goal</type><goal_type>p</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>69</elapsed><player2>24658</player2><subtype>shot</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379030</id><n>270</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>74</elapsed><player2>23782</player2><subtype>header</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379074</id><n>283</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>76</elapsed><player2>23782</player2><subtype>shot</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379095</id><n>277</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>90</elapsed><player2>45431</player2><subtype>shot</subtype><player1>46403</player1><sortorder>1</sortorder><team>8456</team><id>379250</id><n>281</n><type>goal</type><goal_type>n</goal_type></value></goal> 
+0

我忘了提。我正在使用R来解决这个问题。 – UpsideDown

+0

你可以发布所有'库'行吗?除非使用base R,否则不要假设包。另外,您可以发布嵌套XML数据的样本吗?由于XML名称中的空格无效,因此无法完全明白''的含义。 – Parfait

+0

对不起。这是我的第一个stackexchange文章。

库(XML)
库(plyr)
库(dplyr)


关于 - 有没有真正在那里的任何空间。我在这里用空格写了它,因为当我没有空格(它存在于数据中)的时候,它没有显示在屏幕上。我不确定是否有这个“逃生”角色,但我找不到。

我会在几分钟内发布数据的样本... – UpsideDown

考虑要么子集化的非NA值的列或lapply调用中使用tryCatch

数据

library(XML) 
library(plyr) 
library(dplyr) 

txt = 'rowID season_years match_date_ti matchid goal 
1727 "2015/2016" "9/27/2015 0:00" 1979894 "NA" 
1728 "2015/2016" "9/26/2015 0:00" 1979895 "NA" 
1729 "2008/2009" "8/17/2008 0:00" 489042 "<goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>22</elapsed><player2>38807</player2><subtype>header</subtype><player1>37799</player1><sortorder>5</sortorder><team>10261</team><id>378998</id><n>295</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>24</elapsed><player2>24154</player2><subtype>shot</subtype><player1>24148</player1><sortorder>4</sortorder><team>10260</team><id>379019</id><n>298</n><type>goal</type><goal_type>n</goal_type></value></goal>" 
1730 "2008/2009" "8/16/2008 0:00" 489043 "<goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>39297</player2><subtype>shot</subtype><player1>26181</player1><sortorder>2</sortorder><team>9825</team><id>375546</id><n>231</n><type>goal</type><goal_type>n</goal_type></value></goal>" 
1731 "2008/2009" "8/16/2008 0:00" 489044 "<goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>407</event_incident_typefk><elapsed>83</elapsed><player2>30889</player2><subtype>distance</subtype><player1>30853</player1><sortorder>0</sortorder><team>8650</team><id>378041</id><n>344</n><type>goal</type><goal_type>n</goal_type></value></goal>" 
1732 "2008/2009" "8/16/2008 0:00" 489045 "<goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>36394</player2><subtype>shot</subtype><player1>23139</player1><sortorder>2</sortorder><team>8654</team><id>376060</id><n>244</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>10</elapsed><player2>37277</player2><subtype>shot</subtype><player1>23139</player1><sortorder>1</sortorder><team>8654</team><id>376165</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>411</event_incident_typefk><elapsed>47</elapsed><player2>34466</player2><subtype>volley</subtype><player1>127857</player1><sortorder>3</sortorder><team>8528</team><id>376929</id><n>294</n><type>goal</type><goal_type>n</goal_type></value></goal>" 
1733 "2008/2009" "8/17/2008 0:00" 489046 "<goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>47</elapsed><player2>23354</player2><subtype>header</subtype><player1>26165</player1><sortorder>2</sortorder><team>10252</team><id>378837</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>p</comment><stats><penalties>1</penalties></stats><event_incident_typefk>20</event_incident_typefk><elapsed>64</elapsed><player1>40198</player1><sortorder>0</sortorder><team>8456</team><id>378981</id><n>266</n><type>goal</type><goal_type>p</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>69</elapsed><player2>24658</player2><subtype>shot</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379030</id><n>270</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>74</elapsed><player2>23782</player2><subtype>header</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379074</id><n>283</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>76</elapsed><player2>23782</player2><subtype>shot</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379095</id><n>277</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>90</elapsed><player2>45431</player2><subtype>shot</subtype><player1>46403</player1><sortorder>1</sortorder><team>8456</team><id>379250</id><n>281</n><type>goal</type><goal_type>n</goal_type></value></goal>"' 

match <- read.table(text=txt, sep="", header=TRUE, na.strings = "NA") 

非NA子集方法

var1 <- lapply(match[!is.na(match$goal), c("goal")], as.character) 
var2 <- lapply(var1, xmlToList) 
var3 <- lapply(var2, ldply, data.frame) 
var4 <- lapply(var3, subset, select = c("player1", "id")) 

TryCatch方法(返回为NA值的空<goal></goal>

var2 <- lapply(match$goal, function(i) { 
    tryCatch({ return(xmlToList(as.character(i))) 
    }, error = function(e) return(xmlToList('<goal></goal>'))) 
}) 

var3 <- Filter(function(i) nrow(i) > 1, lapply(var2, ldply, data.frame)) 
var4 <- lapply(var3, subset, select = c("player1", "id")) 

的if/else方法

var2 <- lapply(match$goal, function(i) { 
    if(!is.na(i)) 
     return(xmlToList(as.character(i))) 
    else 
     return(xmlToList('<goal></goal>')) 
}) 

var3 <- Filter(function(i) nrow(i) > 1, lapply(var2, ldply, data.frame)) 
var4 <- lapply(var3, subset, select = c("player1", "id")) 
+0

谢谢@Parfait!这是太棒了。我能够非常容易地使Non-NA子集方法工作。其他两种方法运行良好,但是当我运行var4 UpsideDown

+0

哎呀!不是你的问题,但我的。最后两种方法从需要过滤的NAs中返回空数据帧。请参阅编辑。此外,'tryCatch'对于非xml字符串,零长度字符串等任何其他问题都有帮助。 – Parfait

+0

作为noob stackoverflow用户,我不确定适当的礼节。我只是想再次感谢你。我还想让你知道tryCatch()和if/else方法的编辑像梦一样工作 - 甚至比!is.na()更好,因为它们跳过了不包含我正在提取的列的任何单元格。 @Parfait – UpsideDown