正则表达式匹配草率部分/混合编号
问题描述:
我有一系列包含混合编号的文本(即:整个部分和小数部分)。的问题是,该文本是全人编码草率的:正则表达式匹配草率部分/混合编号
- 整个部分可以或可以不存在(例如:“10”)
- 小数部分可以或可以不存在(例如:“ 1/3" )
- 这两个部分可通过空间和/或连字符(例如被分离: “10 1/3”, “10-1/3”, “10 - 1/3”)。
- 分数本身在数字和斜杠之间可能有也可能没有空格(例如:“1/3”,“1/3”,“1/3”)。
- 可能有其他的文本需要的部分被忽略
后,我需要一个正则表达式,可以解析这些元素,这样我可以摆脱这种混乱的创建合适的数量。
答
这里有一个正则表达式将处理所有的数据,我可以扔了:
(\d++(?! */))? *-? *(?:(\d+) */ *(\d+))?.*$
这将会把数字分为以下几组:
- 混合数字的整数部分如果它存在
- 的分子,如果一小部分退出
- 分母,如果一小部分存在
而且,这里的元素使用RegexBuddy说明(施工时它这让我非常):
Match the regular expression below and capture its match into backreference number 1 «(\d++(?! */))?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match a single digit 0..9 «\d++»
Between one and unlimited times, as many times as possible, without giving back (possessive) «++»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?! */)»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “/” literally «/»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “-” literally «-?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below «(?:(\d+) */ *(\d+))?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regular expression below and capture its match into backreference number 2 «(\d+)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “/” literally «/»
Match the character “ ” literally « *»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the regular expression below and capture its match into backreference number 3 «(\d+)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match any single character that is not a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
答
我认为它可能是更容易应对不同的情况(只有充分混合,分数,只有数)彼此分开。例如:
sub parse_mixed {
my($mixed) = @_;
if($mixed =~ /^ *(\d+)[- ]+(\d+) *\/ *(\d)+(\D.*)?$/) {
return $1+$2/$3;
} elsif($mixed =~ /^ *(\d+) *\/ *(\d+)(\D.*)?$/) {
return $1/$2;
} elsif($mixed =~ /^ *(\d+)(\D.*)?$/) {
return $1;
}
}
print parse_mixed("10"), "\n";
print parse_mixed("1/3"), "\n";
print parse_mixed("1/3"), "\n";
print parse_mixed("10 1/3"), "\n";
print parse_mixed("10-1/3"), "\n";
print parse_mixed("10 - 1/3"), "\n";
答
如果您使用的是Perl 5.10
,这就是我如何写它。
m{ ^ \s* # skip leading spaces (?'whole' \d++ (?! \s*[\/]) # there should not be a slash immediately following a whole number ) \s* (?: # the rest should fail or succeed as a group -? # ignore possible neg sign \s* (?'numerator' \d+ ) \s* [\/] \s* (?'denominator' \d+ ) )? }x
然后你可以从%+
变量访问值这样的:
$+{whole};
$+{numerator};
$+{denominator};
我已经有了一个解决方案正则表达式,它的作品真的很好,所以我打算与分享所以希望它能够拯救别人很多工作。 – 2008-10-29 00:14:14
它使用哪种语言和/或正则表达式引擎? – 2008-10-30 04:21:33