Perl正则表达式包含多个单词的匹配行

问题描述：

我正在尝试开发一个相当快速的全文搜索。它将读取索引，并且理想情况下应该只在一个正则表达式中运行匹配。Perl正则表达式包含多个单词的匹配行

因此，我需要一个匹配行的正则表达式，只有当某些单词被包含时才匹配行。

E.g. for

my $txt="one two three four five\n". 
     "two three four\n". 
     "this is just a one two three test\n";

只有第一行和第三行应该匹配，因为第二行不包含单词“one”。

现在我可以通过每一行（）或使用多个正则表达式，但我需要我的解决方案是快速的。

从这里的例子： http://www.regular-expressions.info/completelines.html （“查找包含行或不包含某些词”）

正是我需要的。但是，我无法在Perl中使用它。我尝试了很多，但它没有得出任何结果。

my $txt="one two three four five\ntwo three four\nthis is just a one two three test\n"; 
my @matches=($txt=~/^(?=.*?\bone\b)(?=.*?\btwo\b)(?=.*?\bthree\b).*$/gi); 
print join("\n",@matches);

不给出结果。

总结：我需要一个正则表达式来匹配包含多个单词的行，并返回这些整行。

在此先感谢您的帮助！我尝试了这么多，但只是没有得到它的工作。

答

^和$元字符默认情况下只匹配输入的开始和结束。为了让他们行的开始和结束匹配，使m（多行）标志：

my $txt="one two three four five\ntwo three four\nthis is just a one two three test\n"; 
my @matches=($txt=~/^(?=.*?\bone\b)(?=.*?\btwo\b)(?=.*?\bthree\b).*$/gim); 
print join("\n",@matches);

生产：

one two three four five 
this is just a one two three test

但是，如果你真的想要一个快速搜索，正则表达式（用很多展望未来）不是要走的路，如果你问我。

但这不是想要的...... – ysth 2011-01-21 09:25:41

啊，你说得对。我没有仔细阅读这个问题。 – 2011-01-21 09:26:32

答

代码：

use 5.012; 
use Benchmark qw(cmpthese); 
use Data::Dump; 
use once; 

our $str = <<STR; 
one thing 
another two 
three to go 
no war 
alone in the dark 
war never changes 
STR 

our @words = qw(one war two); 

cmpthese(100000, { 
    'regexp with o'    => sub { 
     my @m; 
     my $words = join '|', @words; 
     @m = $str =~ /(?!.*?\b(?:$words)\b)^(.*)$/omg; 
     ONCE { say 'regexp with o:'; dd @m } 
    }, 
    'regexp'     => sub { 
     my @m; 
     @m = $str =~ /(?!.*?\b(?:@{ [ join '|', @words ] })\b)^(.*)$/mg; 
     ONCE { say 'regexp:'; dd @m } 
    }, 
    'while'      => sub { 
     my @m; 
     @m = grep $_ !~ /\b(?:@{ [ join '|',@words ] })\b/,(split /\n/,$str); 
     ONCE { say 'while:'; dd @m } 
    }, 
    'while with o'    => sub { 
     my @m; 
     my $words = join '|',@words; 
     @m = grep $_ !~ /\b(?:$words)\b/o,(split /\n/,$str); 
     ONCE { say 'while with o:'; dd @m } 
    } 
})

由于：

regexp: 
("three to go", "alone in the dark") 
regexp with o: 
("three to go", "alone in the dark") 
while: 
("three to go", "alone in the dark") 
while with o: 
("three to go", "alone in the dark") 
       Rate  regexp regexp with o   while while with o 
regexp  19736/s   --   -2%   -40%   -60% 
regexp with o 20133/s   2%   --   -38%   -59% 
while   32733/s   66%   63%   --   -33% 
while with o 48948/s   148%   143%   50%   --

Сonclusion

所以，用而变异比变种更具有更快regexp.``

Perl正则表达式包含多个单词的匹配行

相关推荐