python熊猫loc错误

问题描述：

我有一个数据框df与年龄，我正在努力将文件分为0和1年龄组。python熊猫loc错误

DF：

User_ID | Age 
35435  22 
45345  36 
63456  18 
63523  55

我尝试以下

df['Age_GroupA'] = 0 
df['Age_GroupA'][(df['Age'] >= 1) & (df['Age'] <= 25)] = 1

，但得到这个错误

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

为了避免它，我打算去的.loc

df['Age_GroupA'] = 0 
df['Age_GroupA'] = df.loc[(df['Age'] >= 1) & (df['Age'] <= 25)] = 1

然而，这标志着所有年龄为1

这是我得到

User_ID | Age | Age_GroupA 
35435  22  1 
45345  36  1 
63456  18  1 
63523  55  1

，而这是我们的目标

User_ID | Age | Age_GroupA 
35435  22  1 
45345  36  0 
63456  18  1 
63523  55  0

谢谢

你想'df.loc [（DF [ 'Age_MDB_S']> = 1）＆（DF [ 'Age_MDB_S'] EdChum

这个工作很大@EdChum ;你能否将它作为答案发布，以便我可以接受它？谢谢 – jeangelj

@EdChum：来吧，这不是问题或旁边，所以它不应该是一个评论.. ;-) – DSM

答

由于同伴压力（@DSM），我觉得有必要击穿你的错误：

df['Age_GroupA'][(df['Age'] >= 1) & (df['Age'] <= 25)] = 1

这是chained indexing/assignment

所以你尝试过什么未来：

df['Age_GroupA'] = df.loc[(df['Age'] >= 1) & (df['Age'] <= 25)] = 1

不正确的形式，当使用loc你想要：

df.loc[<boolean mask>, cols of interest] = some scalar or calculated value

这样的：

df.loc[(df['Age_MDB_S'] >= 1) & (df['Age_MDB_S'] <= 25), 'Age_GroupA'] = 1

你也可以这样做使用np.where：

df['Age_GroupA'] = np.where((df['Age_MDB_S'] >= 1) & (df['Age_MDB_S'] <= 25), 1, 0)

要在1号线做到这一点，有很多方法可以做到这

谢谢 - 非常棒，我认为我现在得到了doc逻辑 – jeangelj

@jeangelj这里的微妙错误是你在中间部分的掩码是你指定的'1'，但是这个链接到lhs，所以所有的行都被分配到1 – EdChum

我看到了;我肯定看到了loc在np.where和其他方法上的优势;所以非常感谢你 – jeangelj

答

可以布尔蒙版转换int - True是1和False是0：

df['Age_GroupA'] = ((df['Age'] >= 1) & (df['Age'] <= 25)).astype(int) 
print (df) 
    User ID  Age Age_GroupA 
0 35435   22   1 
1 45345   36   0 
2 63456   18   1 
3 63523   55   0

答

这对我有效。耶兹列尔已经解释了它。

dataframe['Age_GroupA'] = ((dataframe['Age'] >= 1) & (dataframe['Age'] <= 25)).astype(int)

相关推荐