熊猫为什么我的列数据类型改变了?
问题描述:
请有人解释为什么当我用熊猫创建一个简单的异构数据框时,当我单独访问每一行时,数据类型会发生变化。熊猫为什么我的列数据类型改变了?
例如
scene_df = pd.DataFrame({
'magnitude': np.random.uniform(0.1, 0.3, (10,)),
'x-center': np.random.uniform(-1, 1, (10,)),
'y-center': np.random.uniform(-1, 1, (10,)),
'label': np.random.randint(2, size=(10,), dtype='u1')})
scene_df.dtypes
打印:
label uint8
magnitude float64
x-center float64
y-center float64
dtype: object
但是当我重复行:
[r['label'].dtype for i, r in scene_df.iterrows()]
我得到float64的标签
[dtype('float64'),
dtype('float64'),
dtype('float64'),
dtype('float64'),
dtype('float64'),
...
编辑:
要回答什么,我打算用这个做:
def square(mag, x, y):
wh = np.array([mag, mag])
pos = np.array((x, y)) - wh/2
return plt.Rectangle(pos, *wh)
def circle(mag, x, y):
return plt.Circle((x, y), mag)
shape_fn_lookup = [square, circle]
,因为这丑陋的代码从而结束了:
[shape_fn_lookup[int(s['label'])](
*s[['magnitude', 'x-center', 'y-center']])
for i, s in scene_df.iterrows()]
其中给出一堆的圆圈和方块,我可能绘制的:
[<matplotlib.patches.Circle at 0x7fcf3ea00d30>,
<matplotlib.patches.Circle at 0x7fcf3ea00f60>,
<matplotlib.patches.Rectangle at 0x7fcf3eb4da90>,
<matplotlib.patches.Circle at 0x7fcf3eb4d908>,
...
]
即使DataFrame.to_dict('records')
执行此数据类型转换:
type(scene_df.to_dict('records')[0]['label'])
答
我建议使用itertuples代替interrows因为iterrows返回一个系列的每一行,它不保留跨dtypes行(对于DataFrame跨列保留dtypes)。
[type(r.label) for r in scene_df.itertuples()]
输出:
[numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8,
numpy.uint8]
答
因为iterrows()
返回一个Series,其索引由每行的列名组成。
Pandas.Series只有一个D型,所以它会被downcasted到float64
:
In [163]: first_row = list(scene_df.iterrows())[0][1]
In [164]: first_row
Out[164]:
label 0.000000
magnitude 0.293681
x-center -0.628142
y-center -0.218315
Name: 0, dtype: float64 # <--------- NOTE
In [165]: type(first_row)
Out[165]: pandas.core.series.Series
In [158]: [(type(r), r.dtype) for i, r in scene_df.iterrows()]
Out[158]:
[(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64')),
(pandas.core.series.Series, dtype('float64'))]
是的,这对我的用例来说更好: '[shape_fn_lookup [s](* rest)for i,s,* rest in scene_df。 itertuples()]' –