str.startswith使用正则表达式

问题描述：

我可以理解为什么str.startswith（）不处理正则表达式：str.startswith使用正则表达式

col1 
0 country 
1 Country 

i.e : df.col1.str.startswith('(C|c)ountry')

返回全是假的值：

col1 
0 False 
1 False

你确定'startswith'接受字符串或正则表达式作为参数吗？ – rock321987

'pandas.Series.str.startswith'不接受正则表达式。 –

@保罗H你我几乎立即删除，因为我注意到熊猫标签 – duncan

答

Series.str.startswith不接受正则表达式。使用Series.str.match来代替：

df.col1.str.match(r'(C|c)ountry', as_indexer=True)

输出：

0 True 
1 True 
Name: col1, dtype: bool

如果有由于整个表达式必须匹配，因此“国家”之后的文本。查看我的答案，找到与'startswith'等价的东西。 –

@MadPhysicist不正确，至少在v0.18.1。 'Series.str.match'依赖're.match'，它匹配字符串的开头。 –

此外，匹配现在是一个不推荐使用的函数：http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.match.html#pandas.Series.str.match –

答

Series.str.startswith，因为它的目的是类似的行为str.startswith香草Python，它不接受正则表达式不接受正则表达式。另一种方法是使用正则表达式匹配（如解释in the docs）：

df.col1.str.contains('^[Cc]ountry')

的字符类[Cc]可能是一个更好的方式来匹配C或c比(C|c)，当然，除非你需要捕获使用哪个字母。在这种情况下，你可以做([Cc])。

感谢您的澄清@Mad Physicist这是有用的 –