比较两个XMLDocument树与重复节点
问题描述:
我想比较两个XML文件并记录所有差异。问题出现时,节点开始重复。对于两个文件:比较两个XMLDocument树与重复节点
<root>
<a/>
<a/>
<b/>
</root>
和:
<root>
<a/>
<b/>
</root>
我的计划目前不记录任何差异。在(大和丑陋的)方法如下:
private void searchDocumentTrees (Node nodeA, Node nodeB, ArrayList<String> differences) {
if (nodeA.hasChildNodes() && !nodeB.hasChildNodes()) {
// record A deeper at this node
return;
}
else if (!nodeA.hasChildNodes() && nodeB.hasChildNodes()) {
// record B deeper at this node
return;
}
else if (!nodeA.hasChildNodes() && !nodeB.hasChildNodes()) {
return;
}
NodeList childrenA = nodeA.getChildNodes();
NodeList childrenB = nodeB.getChildNodes();
// indexes of nodes present in both lists of children as
// NodeList doesn't allow searching by value
ArrayList<Integer> presentInBothIndexA = new ArrayList<>();
ArrayList<Integer> presentInBothIndexB = new ArrayList<>();
// check for nodes present in both trees, record those present only in A
for (int indexA = 0; indexA < childrenA.getLength(); indexA++) {
boolean isPresentInBoth = false;
Node currentA = childrenA.item(indexA);
if (currentA.getNodeType() == Node.ELEMENT_NODE) {
for (int indexB = 0; indexB < childrenB.getLength(); indexB++) {
Node currentB = childrenB.item(indexB);
if (currentB.getNodeType() == Node.ELEMENT_NODE) {
// if the nodes match, record their indexes and break from inner loop
if (currentA.getNodeName().equals(currentB.getNodeName())) {
isPresentInBoth = true;
presentInBothIndexA.add(indexA);
presentInBothIndexB.add(indexB);
break;
}
}
}
// if the flag has not been changed currentA is not present in childrenB
if (!isPresentInBoth) {
// record as present only in A
}
}
}
// record nodes present only in B
for (...){
/* same nested loop - this time the outer is iterating over B
and matching nodes indexes are not recorded - record only B - A */
}
for (int indexBoth = 0, len = presentInBothIndexA.size(); indexBoth < len; indexBoth++) {
Node currentA = childrenA.item(presentInBothIndexA.get(indexBoth));
Node currentB = childrenB.item(presentInBothIndexB.get(indexBoth));
searchDocumentTrees(currentA,currentB,differences);
}
}
我的第一个想法是,以取代isPresentInBoth标志在这两个文件occurence的柜台,但是这可能会引入从而第三回路日益复杂,甚至更多。你有更好的主意吗?
答
我发现两种解决方案:
溶液1
尝试各种(低效)后接近例如计算节点的出现次数并将它们存储在哈希表中我意识到,我拥有存储相同节点索引的结构。这是当然的:
ArrayList<Integer> presentInBothIndexA = new ArrayList<>(); ArrayList<Integer> presentInBothIndexB = new ArrayList<>();
所以,而不只是让他们挂的,我把他们的工作:
// pseudo-code for simplification
for(nodeA in fileA) {
for(nodeB in fileB) {
// check all the aforementioned conditions
if(presentInBothIndexB.contains(indexB))
continue; // skip if it was already recorded
// else, do all the other stuff - isPresentInBoth = true, and so on
现在第二个循环不需要一个内部循环:
for (nodeB in B) {
if (!presentInBothIndexB.contains(indexB))
//record difference - we only need to look for the nodes, that were skipped
//by the first loop, i.e. not present in file A
这种方法有其缺点,因为它比较节点按照它们放在文件中的顺序,所以在这种情况下:
<r>
<a/>
<a/>
<a><b/></a>
</r>
和:
<r>
<a/>
<a/>
</r>
,将记录有不同数量的节点,但在第一个文件不会搜索更深。这是由于这样的事实,即在将两个节点记录为相同之后,它看起来并不会更进一步。这是一个麻烦,但我想我们可以做出这样的假设。不过,也有属性和值进行比较,整个事情变得混乱和混乱,这使宓:
溶液2 只需使用XMLUnit。认真。