2.8 排序

import numpy as np

x = np.array([2, 1, 4, 3, 5])

np.sort(x)  # x的排序结果

array([1, 2, 3, 4, 5])

x.sort()  # 排序后赋值给x

array([1, 2, 3, 4, 5])

x = np.array([2, 1, 4, 3, 5])

i = np.argsort(x)  # x排序后的索引值

array([1, 0, 3, 2, 4], dtype=int32)

x[i]

array([1, 2, 3, 4, 5])

rand = np.random.RandomState(42)

X = rand.randint(0, 10, (4, 6))

array([[6, 3, 7, 4, 6, 9],
       [2, 6, 7, 4, 3, 7],
       [7, 2, 5, 4, 1, 7],
       [5, 1, 4, 0, 9, 5]])

np.sort(X, axis=0)  # 对每列排序

array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]])

np.sort(X, axis=1)  # 对每行排序

array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]])

不进行完全排序，而只是想得到数组中前 K

小的值，可以用如下方法：

x = np.array([7, 2, 3, 1, 6, 5, 4])

np.partition(x, 3)  # K=3

array([2, 1, 3, 4, 6, 5, 7])

该方法将前 K

小的值放到返回数组的前 K

个位置，后边的值是任意顺序。

np.partition(X, 2, axis=1)  # 按行分隔出前2个小值

array([[3, 4, 6, 7, 6, 9],
       [2, 3, 4, 7, 6, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 9, 5]])

K

个最近邻

该示例展示如何用 argsort 函数沿着多个轴快速找到集合中每个点的最近邻。首先，创建10个二维点：

X = rand.rand(10, 2)

%matplotlib inline

import matplotlib.pyplot as plt

import seaborn; seaborn.set()

plt.scatter(X[:, 0], X[:, 1], s=100);

2.8 排序

# 在坐标系中计算每对点的差值

differences = X[:, np.newaxis, :] - X[np.newaxis, :, :]

differences.shape

(10, 10, 2)

# 求出差值的平方

sq_differences = differences**2

sq_differences.shape

(10, 10, 2)

# 将差值求和获得平方距离

dist_sq = sq_differences.sum(-1)

dist_sq.shape

(10, 10)

dist_sq.diagonal()  # 对角线数值，应该全为0

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

# 当有了这个转化为两点见的平方距离的矩阵后，

# 就可用np.argsort函数沿着每行进行排序了。

# 最左边的列给出的索引值就是最近邻

# 注意：第一列是按照0-9排列的，因每个点的最近邻是其自身

nearest = np.argsort(dist_sq, axis=1)

nearest

array([[0, 3, 9, 7, 1, 4, 2, 5, 6, 8],
       [1, 4, 7, 9, 3, 6, 8, 5, 0, 2],
       [2, 1, 4, 6, 3, 0, 8, 9, 7, 5],
       [3, 9, 7, 0, 1, 4, 5, 8, 6, 2],
       [4, 1, 8, 5, 6, 7, 9, 3, 0, 2],
       [5, 8, 6, 4, 1, 7, 9, 3, 2, 0],
       [6, 8, 5, 4, 1, 7, 9, 3, 2, 0],
       [7, 9, 3, 1, 4, 0, 5, 8, 6, 2],
       [8, 5, 6, 4, 1, 7, 9, 3, 2, 0],
       [9, 7, 3, 0, 1, 4, 5, 8, 6, 2]], dtype=int32)

# 如果仅关心k个最近邻，

# 那么唯一需要的是分隔每一行，这样最小的k+1的平方距离将排在最前面，

# 其他更长的距离占据矩阵该行的其他位置

k = 2

nearest_partition = np.argpartition(dist_sq, k+1, axis=1)

plt.scatter(X[:, 0], X[:, 1], s=100);

# 将每个点与它的两个最近邻连接，画出

for i in range(X.shape[0]):

    for j in nearest_partition[i, :k+1]:

        # 画一条从X[i]到X[j]的线段

        # 用zip方法实现

        plt.plot(*zip(X[j], X[i]), color='black')

2.8 排序

2.8 排序

2.8 排序

K

个最近邻

相关推荐