正则表达式查找标签（preg_replace_callback）

问题描述：

我需要查找所有WP-Plugin标签的出现位置。正则表达式查找标签（preg_replace_callback）

<wpg3>10|300|defaultTemplate|eyJhbGlnbiI6ImFsaWdubGVmdCJ9</wpg3>

有标签（,,, ...）的薮可能的版本，但起点和终点做匹配。这些组是可选的：应该表示可能没有或一个，或两个或三个“|”，它们将选项分开。

我的问题：如果我的搜索字符串中只有一个标签，一切都会按预期工作。但是，如果我将第二个标签添加到我的字符串中，则只会调用一次回调，而不是每个标记一次。在开始或结束时肯定会有某些事情缺失。如果缺少最后一个参数（要素），则正则表达式仅使用多个标记失败。

$return = preg_replace_callback('/<wpg[23](?P<unused>id)?>(?P<uri_or_id>[^\|]*)[\|]?(?P<width>[^\|]*)[\|]?(?P<template>[^\|]*)[\|]?(?P<features>[^\|]*)<\/wpg[23](?P<unused2>id)?>/i', array($this, 'wpg3_content'), $content);

我走上面的例子我想获得：

Array 
(
    [0] => 10|300|defaultTemplate|eyJhbGlnbiI6ImFsaWdubGVmdCJ9 
    [unused] => 
    [1] => 
    [uri_or_id] => 10 
    [2] => 10 
    [width] => 300 
    [3] => 300 
    [template] => defaultTemplate 
    [4] => defaultTemplate 
    [features] => eyJhbGlnbiI6ImFsaWdubGVmdCJ9 
    [5] => eyJhbGlnbiI6ImFsaWdubGVmdCJ9 
)

有没有`|`分隔符之间没有值？通过标签，你的意思是``，对吧？ – 2011-02-11 13:43:20

答

你可以通过$出数组，你应该有标签的名称和内容做了preg_match_all在标签上首先

preg_match_all("/<([^>]*)?>/",$in, $out);

然后循环。

如果标签符合你想要什么，然后

explode($out[2],"|")

或者是你看着你的正则表达式做的一切？

答

一旦你回答我上面的评论，我可能有一些更精确。这是我到目前为止。我用Python做了它，因为它对我来说更容易，但你明白了。

这里是我的正则表达式：

regex = re.compile(''' 
    <(?P<tag>wpg[23])(?P<unused>id)?> 
    (?: 
     (?P<uri_or_id>[^\|<]+) 
     (?: 
      \|(?P<width>[^\|<]+) 
      (?: 
       \|(?P<template>[^\|<]+) 
       (?: 
       \|(?P<features>[^\|<]+) 
       )? 
      )? 
     )? 
    )?</(?P=tag)(?P<unused2>id)?>''', re.IGNORECASE|re.VERBOSE)

每个文本中的选项是强制性的，但可选的非匹配组确保选择确实是可选的。我还使用后视表达式(?P=tag)来确保结束标记与开始标记匹配。我保护比赛[^\|]与[^\|>]多一点，以防止您的多标签问题。

我的测试字符串：

# Your example 
>>> text 
'<wpg3>10|300|defaultTemplate|eyJhbGlnbiI6ImFsaWdubGVmdCJ9</wpg3>' 

# Options should be, well, optional 
>>> text2 
'<wpg3>10|300|defaultTemplate</wpg3>' 

# These two should fail if I understood properly  
>>> text3 
'<wpg3>10|300|defaultTemplate|</wpg3>' 
>>> text4 
'<wpg3>10|300||</wpg3>' 

# Now with more than one tag 
>>> text5 
'<wpg3>10|300|defaultTemplate|eyJhbGlnbiI6ImFsaWdubGVmdCJ9</wpg3><wpg3>25|35|hello|world</wpg3>' 
>>> text6 
'<wpg3>10|300|defaultTemplate|eyJhbGlnbiI6ImFsaWdubGVmdCJ9</wpg3><wpg2>25|35|hello|world</wpg2>' 

# This should fail because tags mismatch 
>>> text7 
'<wpg3>10|300|defaultTemplate|eyJhbGlnbiI6ImFsaWdubGVmdCJ9</wpg2>'

这里是测试：

# Parses as expected 
>>> regex.match(text).groups() 
('wpg3', None, '10', '300', 'defaultTemplate', 'eyJhbGlnbiI6ImFsaWdubGVmdCJ9', None) 
>>> regex.match(text2).groups() 
('wpg3', None, '10', '300', 'defaultTemplate', None, None) 

# These two fail as expected 
>>> regex.match(text3) 
>>> regex.match(text4) 

# Multi-tags now 
>>> for m in regex.finditer(text5): 
... m.groups() 
... 
('wpg3', None, '10', '300', 'defaultTemplate', 'eyJhbGlnbiI6ImFsaWdubGVmdCJ9', None) 
('wpg3', None, '25', '35', 'hello', 'world', None) 
>>> for m in regex.finditer(text6): 
... m.groups() 
... 
('wpg3', None, '10', '300', 'defaultTemplate', 'eyJhbGlnbiI6ImFsaWdubGVmdCJ9', None) 
('wpg2', None, '25', '35', 'hello', 'world', None) 

# The last one fails (tag mismatch) 
>>> regex.match(text7)

这是否符合你需要什么？

我该如何制作一个preg_replace字符串出你的模式？它看起来像你有我需要的东西，但我得到了一个“未知的修改器”（“错误：？也许有一件事：我需要匹配”,和“ - 至少在大写/小写标记。 – digitaldonkey 2011-02-12 13:18:43

FINALLY： $ return = preg_replace_callback（'/ （[^>] *）？/i'，数组（$ this，'wpg3_content'），$ content）; 和爆炸（“|”，$ g3_tag [0]）根据您的第一个建议：D 感谢！ – digitaldonkey 2011-02-12 13:36:11

正则表达式查找标签（preg_replace_callback）

相关推荐