转换列的数据列举字典键值

问题描述:

有没有更好的方法(以最少的代码的意义),它可以做如下:一列转换为枚举数值,所以应该去有点这样:转换列的数据列举字典键值

  1. 得到设置项目在列
  2. 做出enumrated字典与键值
  3. 恢复与价值的关键
  4. 使用键值的结果,而不是新数据列中的数据。

所以这里就是我今天做的,不知道是否有人能显示一个经典的方式做到这一点,所以我能避免写功能get_color_val

import pandas as pd 
cars = pd.DataFrame({"car_name": ["BMW","BMW","ACCURA","ACCURA","ACCURA","BMW","BMW","BMW"],"color":["RED","RED","RED","RED","GREEN","BLACK","BLUE","BLUE"]}) 

color_dict = dict(enumerate(set(cars["color"]))) 
color_dict = dict((y,x) for x,y in color_dict.iteritems()) 

def get_color_val(row): 
    my_key = row["color"] 
    my_value = color_dict.get(my_key) 
    return my_value 

cars["color_val"] = cars.apply(get_color_val, axis=1) 
cars = cars.drop("color",1) 
print cars 

结果

Before------------ 
car_name color 
0  BMW RED 
1  BMW RED 
2 ACCURA RED 
3 ACCURA RED 
4 ACCURA GREEN 
5  BMW BLACK 
6  BMW BLUE 
7  BMW BLUE 


After------------ 
car_name color_val 
0  BMW   3 
1  BMW   3 
2 ACCURA   3 
3 ACCURA   3 
4 ACCURA   2 
5  BMW   1 
6  BMW   0 
7  BMW   0 

我会在这种情况下使用pd.factorize()

In [8]: cars['color_val'] = pd.factorize(cars.color)[0] 

In [9]: cars 
Out[9]: 
    car_name color color_val 
0  BMW RED   0 
1  BMW RED   0 
2 ACCURA RED   0 
3 ACCURA RED   0 
4 ACCURA GREEN   1 
5  BMW BLACK   2 
6  BMW BLUE   3 
7  BMW BLUE   3 
+0

在一行中?!?!?!哇谢谢! – adhg

+0

@adhg,是的,另一个爱熊猫的理由......;) – MaxU