比较两个XMLDocument树与重复节点

问题描述：

我想比较两个XML文件并记录所有差异。问题出现时，节点开始重复。对于两个文件：比较两个XMLDocument树与重复节点

<root> 
    <a/> 
    <a/> 
    <b/> 
</root>

和：

<root> 
    <a/> 
    <b/> 
</root>

我的计划目前不记录任何差异。在（大和丑陋的）方法如下：

private void searchDocumentTrees (Node nodeA, Node nodeB, ArrayList<String> differences) { 
    if (nodeA.hasChildNodes() && !nodeB.hasChildNodes()) { 
     // record A deeper at this node 
     return; 
    } 
    else if (!nodeA.hasChildNodes() && nodeB.hasChildNodes()) { 
     // record B deeper at this node 
     return; 
    } 

    else if (!nodeA.hasChildNodes() && !nodeB.hasChildNodes()) { 
     return; 
    } 
    NodeList childrenA = nodeA.getChildNodes(); 
    NodeList childrenB = nodeB.getChildNodes(); 

    // indexes of nodes present in both lists of children as 
    // NodeList doesn't allow searching by value 
    ArrayList<Integer> presentInBothIndexA = new ArrayList<>(); 
    ArrayList<Integer> presentInBothIndexB = new ArrayList<>(); 

    // check for nodes present in both trees, record those present only in A 
    for (int indexA = 0; indexA < childrenA.getLength(); indexA++) { 
     boolean isPresentInBoth = false; 
     Node currentA = childrenA.item(indexA); 
     if (currentA.getNodeType() == Node.ELEMENT_NODE) { 

      for (int indexB = 0; indexB < childrenB.getLength(); indexB++) { 
       Node currentB = childrenB.item(indexB); 
       if (currentB.getNodeType() == Node.ELEMENT_NODE) { 
        // if the nodes match, record their indexes and break from inner loop 
        if (currentA.getNodeName().equals(currentB.getNodeName())) { 
         isPresentInBoth = true; 
         presentInBothIndexA.add(indexA); 
         presentInBothIndexB.add(indexB); 
         break; 
        } 
       } 
      } 

      // if the flag has not been changed currentA is not present in childrenB 
      if (!isPresentInBoth) { 
       // record as present only in A 
      } 
     } 
    } 

    // record nodes present only in B 
    for (...){ 
      /* same nested loop - this time the outer is iterating over B 
      and matching nodes indexes are not recorded - record only B - A */ 
    } 

    for (int indexBoth = 0, len = presentInBothIndexA.size(); indexBoth < len; indexBoth++) { 
     Node currentA = childrenA.item(presentInBothIndexA.get(indexBoth)); 
     Node currentB = childrenB.item(presentInBothIndexB.get(indexBoth)); 
     searchDocumentTrees(currentA,currentB,differences); 
    } 



}

我的第一个想法是，以取代isPresentInBoth标志在这两个文件occurence的柜台，但是这可能会引入从而第三回路日益复杂，甚至更多。你有更好的主意吗？

答

我发现两种解决方案：

溶液1

尝试各种（低效）后接近例如计算节点的出现次数并将它们存储在哈希表中我意识到，我拥有存储相同节点索引的结构。这是当然的：

ArrayList<Integer> presentInBothIndexA = new ArrayList<>(); ArrayList<Integer> presentInBothIndexB = new ArrayList<>();

所以，而不只是让他们挂的，我把他们的工作：

// pseudo-code for simplification 
for(nodeA in fileA) { 
    for(nodeB in fileB) { 
     // check all the aforementioned conditions 
     if(presentInBothIndexB.contains(indexB)) 
      continue; // skip if it was already recorded 
     // else, do all the other stuff - isPresentInBoth = true, and so on

现在第二个循环不需要一个内部循环：

for (nodeB in B) { 
    if (!presentInBothIndexB.contains(indexB)) 
     //record difference - we only need to look for the nodes, that were skipped 
     //by the first loop, i.e. not present in file A

这种方法有其缺点，因为它比较节点按照它们放在文件中的顺序，所以在这种情况下：

<r> 
    <a/> 
    <a/> 
    <a><b/></a> 
</r>

和：

<r> 
    <a/> 
    <a/> 
</r>

，将记录有不同数量的节点，但在第一个文件不会搜索更深。这是由于这样的事实，即在将两个节点记录为相同之后，它看起来并不会更进一步。这是一个麻烦，但我想我们可以做出这样的假设。不过，也有属性和值进行比较，整个事情变得混乱和混乱，这使宓：

溶液2 只需使用XMLUnit。认真。

比较两个XMLDocument树与重复节点

相关推荐