正则表达式匹配草率部分/混合编号

问题描述:

我有一系列包含混合编号的文本(即:整个部分和小数部分)。的问题是,该文本是全人编码草率的:正则表达式匹配草率部分/混合编号

  1. 整个部分可以或可以不存在(例如:“10”)
  2. 小数部分可以或可以不存在(例如:“ 1/3" )
  3. 这两个部分可通过空间和/或连字符(例如被分离: “10 1/3”, “10-1/3”, “10 - 1/3”)。
  4. 分数本身在数字和斜杠之间可能有也可能没有空格(例如:“1/3”,“1/3”,“1/3”)。
  5. 可能有其他的文本需要的部分被忽略

后,我需要一个正则表达式,可以解析这些元素,这样我可以摆脱这种混乱的创建合适的数量。

+0

我已经有了一个解决方案正则表达式,它的作品真的很好,所以我打算与分享所以希望它能够拯救别人很多工作。 – 2008-10-29 00:14:14

+0

它使用哪种语言和/或正则表达式引擎? – 2008-10-30 04:21:33

这里有一个正则表达式将处理所有的数据,我可以扔了:

(\d++(?! */))? *-? *(?:(\d+) */ *(\d+))?.*$ 

这将会把数字分为以下几组:

  1. 混合数字的整数部分如果它存在
  2. 的分子,如果一小部分退出
  3. 分母,如果一小部分存在

而且,这里的元素使用RegexBuddy说明(施工时它这让我非常):

Match the regular expression below and capture its match into backreference number 1 «(\d++(?! */))?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
    Match a single digit 0..9 «\d++» 
     Between one and unlimited times, as many times as possible, without giving back (possessive) «++» 
    Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?! */)» 
     Match the character “ ” literally « *» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
     Match the character “/” literally «/» 
Match the character “ ” literally « *» 
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Match the character “-” literally «-?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
Match the character “ ” literally « *» 
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Match the regular expression below «(?:(\d+) */ *(\d+))?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
    Match the regular expression below and capture its match into backreference number 2 «(\d+)» 
     Match a single digit 0..9 «\d+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the character “ ” literally « *» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
    Match the character “/” literally «/» 
    Match the character “ ” literally « *» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
    Match the regular expression below and capture its match into backreference number 3 «(\d+)» 
     Match a single digit 0..9 «\d+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
Match any single character that is not a line break character «.*» 
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Assert position at the end of the string (or before the line break at the end of the string, if any) «$» 

我认为它可能是更容易应对不同的情况(只有充分混合,分数,只有数)彼此分开。例如:

sub parse_mixed { 
    my($mixed) = @_; 

    if($mixed =~ /^ *(\d+)[- ]+(\d+) *\/ *(\d)+(\D.*)?$/) { 
    return $1+$2/$3; 
    } elsif($mixed =~ /^ *(\d+) *\/ *(\d+)(\D.*)?$/) { 
    return $1/$2; 
    } elsif($mixed =~ /^ *(\d+)(\D.*)?$/) { 
    return $1; 
    } 
} 

print parse_mixed("10"), "\n"; 
print parse_mixed("1/3"), "\n"; 
print parse_mixed("1/3"), "\n"; 
print parse_mixed("10 1/3"), "\n"; 
print parse_mixed("10-1/3"), "\n"; 
print parse_mixed("10 - 1/3"), "\n"; 

如果您使用的是Perl 5.10,这就是我如何写它。

 
m{ 
^
    \s*  # skip leading spaces 

    (?'whole' 
    \d++ 
    (?! \s*[\/]) # there should not be a slash immediately following a whole number 
) 

    \s* 

    (?: # the rest should fail or succeed as a group 

    -?  # ignore possible neg sign 
    \s* 

    (?'numerator' 
    \d+ 
    ) 

    \s* 
    [\/] 
    \s* 

    (?'denominator' 
    \d+ 
    ) 
)? 
}x 

然后你可以从%+变量访问值这样的:

$+{whole}; 
$+{numerator}; 
$+{denominator};