错误:XML内容似乎不是XML:嵌套在csv文件中的XML数据的'NA'
我有一些csv格式的数据框,其中一些单元格中嵌套了XML数据(列表)。我想将XML列表转换为数据框,然后从每个新数据框中选择两列,然后将这些新数据框绑定在一起。我已经可以在包含数据的单元格上使用lapply来完成大部分工作。问题是,当我包含包含<NA>
的单元格时,R返回一个错误。错误:XML内容似乎不是XML:嵌套在csv文件中的XML数据的'NA'
一些概述:
- 数据帧被称为“匹配”
- 我正在与被称为“目标”包含XML列表
- 细胞工作列的[1729: 1733]
- 包含< NA细胞>是[1727:1728]
- 备注< NA>:
该代码实际上在括号和'NA'之间没有任何空格,但是,它确实有括号,就像XML一样。我在这里用空格写了它,因为当我没有空格写它时,它就像一个实际的“不适用”。看这里--->)。是的,我是一个noob。
下面是我使用的库:
library(XML)<br>
library(plyr)<br>
library(dplyr)<br>
下面是我使用的成功代码:
var1 <- lapply(match$goal[1729:1733], as.character)
var2 <- lapply(var1, xmlToList)
var3 <- lapply(var2, ldply, data.frame)
var4 <- lapply(var3, subset, select = c("player1", "id"))
这里是如果我尝试包括与<细胞会发生什么NA>值:
gar1 <- lapply(match$goal[1727:1733], as.character)
gar2 <- lapply(gar1, xmlToList)
Error: XML content does not seem to be XML: 'NA'
- 如何运行此代码以便跳过< NA>的?
这是数据。很抱歉它是如此凌乱:
match$goal[1727:1733]
和输出(数据样本):
(遗憾的是它出来作为一个不方便班轮我张贴了,但我会。不断插科打诨,看看我能想出更好的方式来显示它。
[1] <NA> [2] <NA> [3] <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>22</elapsed><player2>38807</player2><subtype>header</subtype><player1>37799</player1><sortorder>5</sortorder><team>10261</team><id>378998</id><n>295</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>24</elapsed><player2>24154</player2><subtype>shot</subtype><player1>24148</player1><sortorder>4</sortorder><team>10260</team><id>379019</id><n>298</n><type>goal</type><goal_type>n</goal_type></value></goal> [4] <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>39297</player2><subtype>shot</subtype><player1>26181</player1><sortorder>2</sortorder><team>9825</team><id>375546</id><n>231</n><type>goal</type><goal_type>n</goal_type></value></goal> [5] <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>407</event_incident_typefk><elapsed>83</elapsed><player2>30889</player2><subtype>distance</subtype><player1>30853</player1><sortorder>0</sortorder><team>8650</team><id>378041</id><n>344</n><type>goal</type><goal_type>n</goal_type></value></goal> [6] <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>36394</player2><subtype>shot</subtype><player1>23139</player1><sortorder>2</sortorder><team>8654</team><id>376060</id><n>244</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>10</elapsed><player2>37277</player2><subtype>shot</subtype><player1>23139</player1><sortorder>1</sortorder><team>8654</team><id>376165</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>411</event_incident_typefk><elapsed>47</elapsed><player2>34466</player2><subtype>volley</subtype><player1>127857</player1><sortorder>3</sortorder><team>8528</team><id>376929</id><n>294</n><type>goal</type><goal_type>n</goal_ty... <truncated> [7] <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>47</elapsed><player2>23354</player2><subtype>header</subtype><player1>26165</player1><sortorder>2</sortorder><team>10252</team><id>378837</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>p</comment><stats><penalties>1</penalties></stats><event_incident_typefk>20</event_incident_typefk><elapsed>64</elapsed><player1>40198</player1><sortorder>0</sortorder><team>8456</team><id>378981</id><n>266</n><type>goal</type><goal_type>p</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>69</elapsed><player2>24658</player2><subtype>shot</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379030</id><n>270</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</g... <truncated> 13225 Levels: <goal /> .
========================= ===============================================
..
数据,尝试2(这是从Excel csv粘贴的。由于这个原因,NA的没有任何括号):
rowID seasonyears match_date_ti matchid goal
1727 2015/2016 9/27/2015 0:00 1979894 NA
1728 2015/2016 9/26/2015 0:00 1979895 NA
1729 2008/2009 8/17/2008 0:00 489042 <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>22</elapsed><player2>38807</player2><subtype>header</subtype><player1>37799</player1><sortorder>5</sortorder><team>10261</team><id>378998</id><n>295</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>24</elapsed><player2>24154</player2><subtype>shot</subtype><player1>24148</player1><sortorder>4</sortorder><team>10260</team><id>379019</id><n>298</n><type>goal</type><goal_type>n</goal_type></value></goal>
1730 2008/2009 8/16/2008 0:00 489043 <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>39297</player2><subtype>shot</subtype><player1>26181</player1><sortorder>2</sortorder><team>9825</team><id>375546</id><n>231</n><type>goal</type><goal_type>n</goal_type></value></goal>
1731 2008/2009 8/16/2008 0:00 489044 <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>407</event_incident_typefk><elapsed>83</elapsed><player2>30889</player2><subtype>distance</subtype><player1>30853</player1><sortorder>0</sortorder><team>8650</team><id>378041</id><n>344</n><type>goal</type><goal_type>n</goal_type></value></goal>
1732 2008/2009 8/16/2008 0:00 489045 <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>36394</player2><subtype>shot</subtype><player1>23139</player1><sortorder>2</sortorder><team>8654</team><id>376060</id><n>244</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>10</elapsed><player2>37277</player2><subtype>shot</subtype><player1>23139</player1><sortorder>1</sortorder><team>8654</team><id>376165</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>411</event_incident_typefk><elapsed>47</elapsed><player2>34466</player2><subtype>volley</subtype><player1>127857</player1><sortorder>3</sortorder><team>8528</team><id>376929</id><n>294</n><type>goal</type><goal_type>n</goal_type></value></goal>
1733 2008/2009 8/17/2008 0:00 489046 <goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>47</elapsed><player2>23354</player2><subtype>header</subtype><player1>26165</player1><sortorder>2</sortorder><team>10252</team><id>378837</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>p</comment><stats><penalties>1</penalties></stats><event_incident_typefk>20</event_incident_typefk><elapsed>64</elapsed><player1>40198</player1><sortorder>0</sortorder><team>8456</team><id>378981</id><n>266</n><type>goal</type><goal_type>p</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>69</elapsed><player2>24658</player2><subtype>shot</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379030</id><n>270</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>74</elapsed><player2>23782</player2><subtype>header</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379074</id><n>283</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>76</elapsed><player2>23782</player2><subtype>shot</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379095</id><n>277</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>90</elapsed><player2>45431</player2><subtype>shot</subtype><player1>46403</player1><sortorder>1</sortorder><team>8456</team><id>379250</id><n>281</n><type>goal</type><goal_type>n</goal_type></value></goal>
考虑要么子集化的非NA值的列或lapply调用中使用tryCatch
:
数据
library(XML)
library(plyr)
library(dplyr)
txt = 'rowID season_years match_date_ti matchid goal
1727 "2015/2016" "9/27/2015 0:00" 1979894 "NA"
1728 "2015/2016" "9/26/2015 0:00" 1979895 "NA"
1729 "2008/2009" "8/17/2008 0:00" 489042 "<goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>22</elapsed><player2>38807</player2><subtype>header</subtype><player1>37799</player1><sortorder>5</sortorder><team>10261</team><id>378998</id><n>295</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>24</elapsed><player2>24154</player2><subtype>shot</subtype><player1>24148</player1><sortorder>4</sortorder><team>10260</team><id>379019</id><n>298</n><type>goal</type><goal_type>n</goal_type></value></goal>"
1730 "2008/2009" "8/16/2008 0:00" 489043 "<goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>39297</player2><subtype>shot</subtype><player1>26181</player1><sortorder>2</sortorder><team>9825</team><id>375546</id><n>231</n><type>goal</type><goal_type>n</goal_type></value></goal>"
1731 "2008/2009" "8/16/2008 0:00" 489044 "<goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>407</event_incident_typefk><elapsed>83</elapsed><player2>30889</player2><subtype>distance</subtype><player1>30853</player1><sortorder>0</sortorder><team>8650</team><id>378041</id><n>344</n><type>goal</type><goal_type>n</goal_type></value></goal>"
1732 "2008/2009" "8/16/2008 0:00" 489045 "<goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>4</elapsed><player2>36394</player2><subtype>shot</subtype><player1>23139</player1><sortorder>2</sortorder><team>8654</team><id>376060</id><n>244</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>10</elapsed><player2>37277</player2><subtype>shot</subtype><player1>23139</player1><sortorder>1</sortorder><team>8654</team><id>376165</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>411</event_incident_typefk><elapsed>47</elapsed><player2>34466</player2><subtype>volley</subtype><player1>127857</player1><sortorder>3</sortorder><team>8528</team><id>376929</id><n>294</n><type>goal</type><goal_type>n</goal_type></value></goal>"
1733 "2008/2009" "8/17/2008 0:00" 489046 "<goal><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>47</elapsed><player2>23354</player2><subtype>header</subtype><player1>26165</player1><sortorder>2</sortorder><team>10252</team><id>378837</id><n>251</n><type>goal</type><goal_type>n</goal_type></value><value><comment>p</comment><stats><penalties>1</penalties></stats><event_incident_typefk>20</event_incident_typefk><elapsed>64</elapsed><player1>40198</player1><sortorder>0</sortorder><team>8456</team><id>378981</id><n>266</n><type>goal</type><goal_type>p</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>69</elapsed><player2>24658</player2><subtype>shot</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379030</id><n>270</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>406</event_incident_typefk><elapsed>74</elapsed><player2>23782</player2><subtype>header</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379074</id><n>283</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>76</elapsed><player2>23782</player2><subtype>shot</subtype><player1>23264</player1><sortorder>1</sortorder><team>10252</team><id>379095</id><n>277</n><type>goal</type><goal_type>n</goal_type></value><value><comment>n</comment><stats><goals>1</goals><shoton>1</shoton></stats><event_incident_typefk>393</event_incident_typefk><elapsed>90</elapsed><player2>45431</player2><subtype>shot</subtype><player1>46403</player1><sortorder>1</sortorder><team>8456</team><id>379250</id><n>281</n><type>goal</type><goal_type>n</goal_type></value></goal>"'
match <- read.table(text=txt, sep="", header=TRUE, na.strings = "NA")
非NA子集方法
var1 <- lapply(match[!is.na(match$goal), c("goal")], as.character)
var2 <- lapply(var1, xmlToList)
var3 <- lapply(var2, ldply, data.frame)
var4 <- lapply(var3, subset, select = c("player1", "id"))
TryCatch方法(返回为NA值的空<goal></goal>
)
var2 <- lapply(match$goal, function(i) {
tryCatch({ return(xmlToList(as.character(i)))
}, error = function(e) return(xmlToList('<goal></goal>')))
})
var3 <- Filter(function(i) nrow(i) > 1, lapply(var2, ldply, data.frame))
var4 <- lapply(var3, subset, select = c("player1", "id"))
的if/else方法
var2 <- lapply(match$goal, function(i) {
if(!is.na(i))
return(xmlToList(as.character(i)))
else
return(xmlToList('<goal></goal>'))
})
var3 <- Filter(function(i) nrow(i) > 1, lapply(var2, ldply, data.frame))
var4 <- lapply(var3, subset, select = c("player1", "id"))
谢谢@Parfait!这是太棒了。我能够非常容易地使Non-NA子集方法工作。其他两种方法运行良好,但是当我运行var4 UpsideDown
哎呀!不是你的问题,但我的。最后两种方法从需要过滤的NAs中返回空数据帧。请参阅编辑。此外,'tryCatch'对于非xml字符串,零长度字符串等任何其他问题都有帮助。 – Parfait
作为noob stackoverflow用户,我不确定适当的礼节。我只是想再次感谢你。我还想让你知道tryCatch()和if/else方法的编辑像梦一样工作 - 甚至比!is.na()更好,因为它们跳过了不包含我正在提取的列的任何单元格。 @Parfait – UpsideDown
我忘了提。我正在使用R来解决这个问题。 – UpsideDown
你可以发布所有'库'行吗?除非使用base R,否则不要假设包。另外,您可以发布嵌套XML数据的样本吗?由于XML名称中的空格无效,因此无法完全明白''的含义。 – Parfait
对不起。这是我的第一个stackexchange文章。
库(XML)
库(plyr)
库(dplyr)
关于 - 有没有真正在那里的任何空间。我在这里用空格写了它,因为当我没有空格(它存在于数据中)的时候,它没有显示在屏幕上。我不确定是否有这个“逃生”角色,但我找不到。
我会在几分钟内发布数据的样本... – UpsideDown