使用正则表达式替换文件中的特定字符串PYTHON

问题描述：

我使用Stanford NER来标记文件，我想用“NONE”替换每个“O”标签。我已经试过这段代码，但它显示错误的输出。问题是它会替换字符串中的每个“O”。我不熟悉正则表达式，不知道什么是我的问题正确的正则表达式。 TIA。使用正则表达式替换文件中的特定字符串PYTHON

这里是我的代码：

import re 
    tagged_text = st.tag(per_word(input_file)) 
    string_type = "\n".join(" ".join(line) for line in tagged_text) 

    for line in string_type: 
     output_file.write (re.sub('O$', 'NONE', line))

样品输入：

Tropical O 
    Storm O 
    Jolina O 
    affects O 
    2,000 O 
    people O 
    MANILA LOCATION 
    , O 
    Philippines LOCATION 
    – O 
    Initial O 
    reports O 
    from O 
    the O

OUTPUT：

Tropical NONE 
Storm NONE 
Jolina NONE 
affects NONE 
2,000 NONE 
people NONE 
MANILA LNONECATINONEN 
, NONE 
Philippines LNONECATINONEN 
– NONE 
Initial NONE 
reports NONE 
from NONE 
the NONE

什么是'string_type'？看来你正在循环一个字符串，它会逐字检查字符。 – Psidom

@Psidom我将tagged_text（元组）转换为字符串（string_type），然后逐行读取。 –

它在什么情况下失败。（例如，我试图像 '线='TrOpical O' re.sub（'O $'，'无'，线）' 'TrOpical NONE' – chakri

答

你不通过string_type需要循环，使用re.sub直接在字符串应该工作：

s = """Tropical O 
    Storm O 
    Jolina O 
    affects O 
    2,000 O 
    people O 
    MANILA LOCATION 
    , O 
    Philippines LOCATION 
    – O 
    Initial O 
    reports O 
    from O 
    the O""" 

import re 
print(re.sub(r"\bO(?=\n|$)", "NONE", s))

给出：

Tropical NONE 
    Storm NONE 
    Jolina NONE 
    affects NONE 
    2,000 NONE 
    people NONE 
    MANILA LOCATION 
    , NONE 
    Philippines LOCATION 
    – NONE 
    Initial NONE 
    reports NONE 
    from NONE 
    the NONE

这里\bO(?=\n|$)匹配单个字母O其次可以是新行字符\n或行尾$。

使用正则表达式替换文件中的特定字符串PYTHON

相关推荐