我怎么办布尔切片的元组的数组中numpy的
问题描述:
有一个.csv文件,这样我怎么办布尔切片的元组的数组中numpy的
vehicle,speed,datetime,x,y
61C22276,0.0,1.4926212E9,106.33695,11.12652
60C28912,0.0,1.4926212E9,106.84327166666667,10.90424
51D06538,0.0,1.4926212E9,106.7806,10.765768333333334
50LD08650,0.0,1.4926212E9,106.91705,10.746173333333333
50LD08519,41.0,1.4926212E9,106.95493,10.739623333333334
50LD07182,0.0,1.4926212E9,106.917225,10.746073333333333
我通过
导入这些数据转化为numpy的my_data = genfromtxt('data/2017-04-20.csv',names=True,delimiter=',')
输出是:
[(b'61C22276', 0., 1.49262120e+09, 106.33695 , 11.12652 )
(b'60C28912', 0., 1.49262120e+09, 106.84327167, 10.90424 )
(b'51D06538', 0., 1.49262120e+09, 106.7806 , 10.76576833) ...,
(b'61C18919', 0., 1.49265726e+09, 106.77865833, 11.03690667)
(b'61C18919', 0., 1.49265729e+09, 106.77865833, 11.03690667)
(b'61C18919', 0., 1.49265732e+09, 106.77865833, 11.036905 )]
这是一个元组数组(因为我的数据由多个类型组成)
如何根据列的值对my_data
进行分割? (例如:列出vehicle
61C2226
的所有行)
答
您已获得structured array。然后选择 '行',这里是这样的:
boolindex=my_data['vehicle']=='50LD08519'
selection=my_data[boolindex]
#array([('50LD08519', 0.0, 1492621184.0, 106.91705322265625, 10.746172904968262),
# ('50LD08519', 41.0, 1492621184.0, 106.9549331665039, 10.739623069763184)],
# dtype=[('vehicle', '<U'), ('speed', '<f4'), ('datetime', '<f4'),
# ('x', '<f4'), ('y', '<f4')])
'熊猫' 给你更多kindy I/O和直观的语法:
In [521]: my_data=pd.read_csv('data.csv')
vehicle speed datetime x y
0 61C22276 0 1,492,621,200 106 11
1 60C28912 0 1,492,621,200 107 11
2 51D06538 0 1,492,621,200 107 11
3 50LD08519 0 1,492,621,200 107 11
4 50LD08519 41 1,492,621,200 107 11
5 50LD07182 0 1,492,621,200 107 11
In [522]: my_data[my_data['vehicle']=='50LD08519']
Out[522]:
vehicle speed datetime x y
3 50LD08519 0 1,492,621,200 107 11
4 50LD08519 41 1,492,621,200 107 11