如何使用XML :: Twig跳过不需要的元素？

问题描述：

试图学习XML::Twig并从XML文档中获取一些数据。如何使用XML :: Twig跳过不需要的元素？

我的XML包含20k + <ADN>元素。 Eaach <ADN>元素包含数十个子元素，其中之一是<GID>。我想只处理那些ADN其中GID == 1（参见示例XML是__DATA__）

的文档说：

处理程序被触发，在固定的顺序，按类型（XPath的分类首先是表达式的步骤数，然后是正则表达式，然后是级别），然后通过它们是否指定完整路径（从根元素开始），然后通过表达式中的步骤数，然后判断谓词数，然后测试次数谓词。最后一步不是指定一个步骤（foo/bar/*）的处理程序在其他XPath处理程序之后触发。最后全部处理程序最后触发。

重要提示：如果返回0，则没有其他处理程序被调用，除所有处理程序将被称为反正一个处理程序已被触发。

我的实际代码：

use 5.014; 
use warnings; 
use XML::Twig; 
use Data::Dumper; 

my $cat = load_xml_catalog(); 
say Dumper $cat; 

sub load_xml_catalog { 
     my $hr; 
     my $current; 
     my $twig= XML::Twig->new(
     twig_roots => { 
      ADN => sub {  # process the <ADN> elements 
       $_->purge; # and purge when finishes with one 
      }, 
     }, 
     twig_handlers => { 
      'ADN/GID' => sub { 
       return 1 if $_->trimmed_text == 1; 
       return 0;  # skip the other handlers - if the GID != 1 
      }, 

      'ADN/ID' => sub { #remember the ID as a "key" into the '$hr' for the "current" ADN 
       $current = $_->trimmed_text; 
       $hr->{$current}{$_->tag} = $_->trimmed_text; 
      }, 

      #rules for the wanted data extracting & storing to $hr->{$current} 
      'ADN/Name' => sub { 
       $hr->{$current}{$_->tag} = $_->text; 
      }, 
     }, 
     ); 
     $twig->parse(\*DATA); 
    return $hr; 
} 
__DATA__ 
<ArrayOfADN> 
    <ADN> 
     <GID>1</GID> 
     <ID>1</ID> 
     <Name>name 1</Name> 
    </ADN> 
    <ADN> 
     <GID>2</GID> 
     <ID>20</ID> 
     <Name>should be skipped because GID != 1</Name> 
    </ADN> 
    <ADN> 
     <GID>1</GID> 
     <ID>1000</ID> 
     <Name>other name 1000</Name> 
    </ADN> 
</ArrayOfADN>

它输出

$VAR1 = { 
      '1000' => { 
        'ID' => '1000', 
        'Name' => 'other name 1000' 
        }, 
      '1' => { 
       'Name' => 'name 1', 
       'ID' => '1' 
       }, 
      '20' => { 
        'Name' => 'should be skipped because GID != 1', 
        'ID' => '20' 
       } 
     };

所以，

为ADN/GID回报0当GID = 1
为什么其他处理程序仍然被调用？
预期的（想要的）输出没有'20' => ...。
如何正确跳过不需要的节点？

@toolic根据商务部，_Handlers在固定order_触发的，所以也许我不理解的顺序是如何确定的 - 又名，如何解决上述问题...;） – cajwine

问题是 - 汉德斯以固定的顺序触发_对于特定的元素_。他们重置每个单独的一个。 – Sobrique

答

在这种情况下，“返回零”的东西有点儿红鲱鱼。如果你的元素有多个匹配，那么其中一个返回零会抑制其他元素。

这并不意味着它不会尝试处理后续节点。

我认为你会感到困惑 - 你有处理你的<ADN>元素的独立子元素 - 并且它们分开触发。这是设计。 xpath的优先顺序仅限于重复匹配。你们是完全独立的，所以他们都'火'，因为他们触发不同的元素。

但是，您可能会发现它有用知道 - twig_handlers允许xpath表达 - 这样你就可以明确地说：

#!/usr/bin/env perl 
use strict; 
use warnings; 

use XML::Twig; 
my $twig = XML::Twig->parse(\*DATA); 
$twig -> set_pretty_print('indented_a'); 

foreach my $ADN ($twig -> findnodes('//ADN/GID[string()="1"]/..')) { 
    $ADN -> print; 
}

这也适用于twig_handlers语法。如果你需要预处理你的XML，或者你的内存受限，我建议做一个处理程序是非常有用的。有了20,000个节点，你可能会。（此时purge是你的朋友）。

#!/usr/bin/env perl 
use strict; 
use warnings; 

use XML::Twig; 
my $twig = XML::Twig->new(
    pretty_print => 'indented_a', 
    twig_handlers => { 
     '//ADN[string(GID)="1"]' => sub { $_->print } 
    } 
); 

$twig->parse(\*DATA); 


__DATA__ 
<ArrayOfADN> 
    <ADN> 
     <GID>1</GID> 
     <ID>1</ID> 
     <Name>name 1</Name> 
    </ADN> 
    <ADN> 
     <GID>2</GID> 
     <ID>20</ID> 
     <Name>should be skipped because GID != 1</Name> 
    </ADN> 
    <ADN> 
     <GID>1</GID> 
     <ID>1000</ID> 
     <Name>other name 1000</Name> 
    </ADN> 
</ArrayOfADN>

虽然，我可能只是做它，而不是这样：

#!/usr/bin/env perl 
use strict; 
use warnings; 

use XML::Twig; 

sub process_ADN { 
    my ($twig, $ADN) = @_; 
    return unless $ADN -> first_child_text('GID') == 1; 
    print "ADN with name:", $ADN -> first_child_text('Name')," Found\n"; 
} 


my $twig = XML::Twig->new(
    pretty_print => 'indented_a', 
    twig_handlers => { 
     'ADN' => \&process_ADN 
    } 
); 

$twig->parse(\*DATA); 


__DATA__ 
<ArrayOfADN> 
    <ADN> 
     <GID>1</GID> 
     <ID>1</ID> 
     <Name>name 1</Name> 
    </ADN> 
    <ADN> 
     <GID>2</GID> 
     <ID>20</ID> 
     <Name>should be skipped because GID != 1</Name> 
    </ADN> 
    <ADN> 
     <GID>1</GID> 
     <ID>1000</ID> 
     <Name>other name 1000</Name> 
    </ADN> 
</ArrayOfADN>

如何使用XML :: Twig跳过不需要的元素？

相关推荐