合并索引数组在Python
问题描述:
假设我具有形式合并索引数组在Python
x = [[1,2]
[2,4]
[3,6]
[4,NaN]
[5,10]]
y = [[0,-5]
[1,0]
[2,5]
[5,20]
[6,25]]
两个numpy的阵列是有合并它们,使得我有
xmy = [[0, NaN, -5 ]
[1, 2, 0 ]
[2, 4, 5 ]
[3, 6, NaN]
[4, NaN, NaN]
[5, 10, 20 ]
[6, NaN, 25 ]
我能实现一个简单的函数的有效方式使用搜索来查找索引,但这对于大量数组和大尺寸来说并不优雅并且可能效率低下。任何指针赞赏。
答
见numpy.lib.recfunctions.join_by
它仅适用于结构化的阵列或recarrays,所以有几个扭结。
首先您需要至少对结构化数组有所了解。如果你不是,请参阅here。
import numpy as np
import numpy.lib.recfunctions
# Define the starting arrays as structured arrays with two fields ('key' and 'field')
dtype = [('key', np.int), ('field', np.float)]
x = np.array([(1, 2),
(2, 4),
(3, 6),
(4, np.NaN),
(5, 10)],
dtype=dtype)
y = np.array([(0, -5),
(1, 0),
(2, 5),
(5, 20),
(6, 25)],
dtype=dtype)
# You want an outer join, rather than the default inner join
# (all values are returned, not just ones with a common key)
join = np.lib.recfunctions.join_by('key', x, y, jointype='outer')
# Now we have a structured array with three fields: 'key', 'field1', and 'field2'
# (since 'field' was in both arrays, it renamed x['field'] to 'field1', and
# y['field'] to 'field2')
# This returns a masked array, if you want it filled with
# NaN's, do the following...
join.fill_value = np.NaN
join = join.filled()
# Just displaying it... Keep in mind that as a structured array,
# it has one dimension, where each row contains the 3 fields
for row in join:
print row
此输出:
(0, nan, -5.0)
(1, 2.0, 0.0)
(2, 4.0, 5.0)
(3, 6.0, nan)
(4, nan, nan)
(5, 10.0, 20.0)
(6, nan, 25.0)
希望帮助!
编辑1:添加示例 编辑2:真的不应该加入浮动...更改'键'字段为int。
感谢您的深刻回应。对于我的愚蠢,有没有简单的方法将结构数组转换为ndarray?谢谢。 – leon 2010-05-05 18:49:25
@leon - 这里有一种方法(使用示例中的“join”数组...): join.view(np.float).reshape((join.size,3)) 希望有所帮助! – 2010-05-05 19:22:12
这实际上不起作用,因为第一列被铸造为int。这就是我问的原因。 – leon 2010-05-05 19:30:48