正则表达式匹配草率部分/混合编号

问题描述：

我有一系列包含混合编号的文本（即：整个部分和小数部分）。的问题是，该文本是全人编码草率的：正则表达式匹配草率部分/混合编号

整个部分可以或可以不存在（例如：“10”）
小数部分可以或可以不存在（例如：“ 1/3" ）
这两个部分可通过空间和/或连字符（例如被分离： “10 1/3”， “10-1/3”， “10 - 1/3”）。
分数本身在数字和斜杠之间可能有也可能没有空格（例如：“1/3”，“1/3”，“1/3”）。
可能有其他的文本需要的部分被忽略

后，我需要一个正则表达式，可以解析这些元素，这样我可以摆脱这种混乱的创建合适的数量。

我已经有了一个解决方案正则表达式，它的作品真的很好，所以我打算与分享所以希望它能够拯救别人很多工作。 – 2008-10-29 00:14:14

它使用哪种语言和/或正则表达式引擎？ – 2008-10-30 04:21:33

答

这里有一个正则表达式将处理所有的数据，我可以扔了：

(\d++(?! */))? *-? *(?:(\d+) */ *(\d+))?.*$

这将会把数字分为以下几组：

混合数字的整数部分如果它存在
的分子，如果一小部分退出
分母，如果一小部分存在

而且，这里的元素使用RegexBuddy说明（施工时它这让我非常）：

Match the regular expression below and capture its match into backreference number 1 «(\d++(?! */))?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
    Match a single digit 0..9 «\d++» 
     Between one and unlimited times, as many times as possible, without giving back (possessive) «++» 
    Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?! */)» 
     Match the character “ ” literally « *» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
     Match the character “/” literally «/» 
Match the character “ ” literally « *» 
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Match the character “-” literally «-?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
Match the character “ ” literally « *» 
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Match the regular expression below «(?:(\d+) */ *(\d+))?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
    Match the regular expression below and capture its match into backreference number 2 «(\d+)» 
     Match a single digit 0..9 «\d+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the character “ ” literally « *» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
    Match the character “/” literally «/» 
    Match the character “ ” literally « *» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
    Match the regular expression below and capture its match into backreference number 3 «(\d+)» 
     Match a single digit 0..9 «\d+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
Match any single character that is not a line break character «.*» 
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

答

我认为它可能是更容易应对不同的情况（只有充分混合，分数，只有数）彼此分开。例如：

sub parse_mixed { 
    my($mixed) = @_; 

    if($mixed =~ /^ *(\d+)[- ]+(\d+) *\/ *(\d)+(\D.*)?$/) { 
    return $1+$2/$3; 
    } elsif($mixed =~ /^ *(\d+) *\/ *(\d+)(\D.*)?$/) { 
    return $1/$2; 
    } elsif($mixed =~ /^ *(\d+)(\D.*)?$/) { 
    return $1; 
    } 
} 

print parse_mixed("10"), "\n"; 
print parse_mixed("1/3"), "\n"; 
print parse_mixed("1/3"), "\n"; 
print parse_mixed("10 1/3"), "\n"; 
print parse_mixed("10-1/3"), "\n"; 
print parse_mixed("10 - 1/3"), "\n";

答

如果您使用的是Perl 5.10，这就是我如何写它。

 
m{ 
^
    \s*  # skip leading spaces 

    (?'whole' 
    \d++ 
    (?! \s*[\/]) # there should not be a slash immediately following a whole number 
) 

    \s* 

    (?: # the rest should fail or succeed as a group 

    -?  # ignore possible neg sign 
    \s* 

    (?'numerator' 
    \d+ 
    ) 

    \s* 
    [\/] 
    \s* 

    (?'denominator' 
    \d+ 
    ) 
)? 
}x

然后你可以从%+变量访问值这样的：

$+{whole}; 
$+{numerator}; 
$+{denominator};

正则表达式匹配草率部分/混合编号

相关推荐