获取所有两个特定单词之间的词在python

问题描述：

The pizza is so hot 
Today I bought an hot and tasty pizza

我需要提取比萨饼和形容词在python 热之间的所有的话。我能怎么做？？

这些是输出的一个例子。

is so 
and tasty

请大家记住，属性（比萨饼为例）和形容词（热例如）可能是一个多令牌字。

这就是我想：

attribute = re.search(values[0], descrizione, re.IGNORECASE) 
    value = re.search(names[0], descrizione, re.IGNORECASE) 
    if (attribute): 
     print (attribute.group()) 
     print (descrizione.find(attribute.group())) 

    if (value): 
     print (value.group()) 
     print (descrizione.find(value.group()))

比萨饼和热可以出现多次排队？它应该如何处理？ –

@BrendanAbel no。他们不能：D –

你是否需要找到多个字符串中的单词，或者只是两个字符？ – NBartley

答

我认为一个好的解决办法是利用分裂的，而“|”正则表达式中的字符。

strs = [] 
strs.append('The pizza is so hot') 
strs.append('Today I bought a hot and tasty pizza') 
item = 'pizza' 
adj = 'hot' 
rets = [] 

for str_ in strs: 
    ret = re.split(item + '|' + adj, str_, re.IGNORECASE) 
    rets.append(ret[1].strip())

这工作，因为当我们单独考虑这两个字符串，我们得到三个元素的列表。

ret = re.split(item + '|' + adj, strs[0], re.IGNORECASE) 
print ret 
['the ', ' is so ', ''] 

ret = re.split(item + '|' + adj, strs[1], re.IGNORECASE) 
print ret 
['Today I bought a ', ' and tasty ', '']

因为我们知道这两个词只能在字符串中出现一次，我们能够可靠地将沤[1]的结果，因为该字符串应该只拆分两次：一次是当我们发现一家之言，当我们找到其他的时候。或字符让我们分割列表而不知道提前单词的顺序。

答

一种不同的方法，您可以根据需要定义“from/to”模式。

>>> import regex 
>>> rgx = regex.compile(r'(?si)(?|{0}(.*?){1}|{1}(.*?){0})'.format('pizza', 'hot')) 
>>> s1 = 'The pizza is so hot' 
>>> s2 = 'Today I bought an hot and tasty pizza' 
>>> for s in [s1, s2]: 
...  m = rgx.findall(s) 
...  for x in m: 
...   print x.strip() 

is so 
and tasty

行82，在 rgx = re.compile（r'（？si）（？| {0}（。*？）{1} | {1}（。*？）{0}）'.format（attribute.group（），value.group（）））文件“/ System/Library/Frameworks /Python.framework/Versions/2.7/lib/python2.7/re.py“，第190行，编译为 return _compile（pattern，flags） File”/System/Library/Frameworks/Python.framework/Versions/2.7 /lib/python2.7/re.py“，第242行，在_compile 引发错误，v＃无效表达式 –

我得到这个错误...怎么了？ –

您需要使用正则表达式模块。不重要。 – hwnd

答

x="""The pizza is so hot 
Today I bought an hot and tasty pizza 
wow pizza and another pizza""" 
print [j for i,j in re.findall(r"(pizza|hot)\s*(.*?)\s*(?!\1)(?:hot|pizza)",x)]

与re.findall试试这个。

获取所有两个特定单词之间的词在python

相关推荐