【ML】主成分分析 PCA (Principal Component Analysis)
原理 Theory
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
其效果,一言以蔽之,就是 降维
。 PCA reduces the dimensionality (the number of variables) of a data set by maintaining as much variance as possible.
对于 个样本,每个样本采集 个features,就可以构成一个 的 feature map
, 从这个 feature map 中最多可以提取出 个正交的主成分, 通常 .
PCA 主要用于 data pre-processing
中的 特征提取(feature extraction)
feature selection : 直接从原始features 中选择特定的一组 features
feature extration : 从原始的 features 中构建出新的一组 features
细节 Details
不同 视角下观察PCA
PCA of a multivariate Gaussian distribution centered at (1,3) with a standard deviation of 3 in roughly the (0.866, 0.5) direction and of 1 in the orthogonal direction. The vectors shown are the eigenvectors of the covariance matrix scaled by the square root of the corresponding eigenvalue, and shifted so their tails are at the mean.
概率视角下的方差最大化
Low variance can often be assumed to represent undesired background noise. The dimensionality of the data can therefore be reduced, without loss of relevant information, by extracting a lower dimensional component space covering the highest variance. Using a lower number of principal components instead of the high-dimensional original data is a common pre-processing step that often improves results of subsequent analyses such as classification.
For visualization, the first and second component can be plotted against each other to obtain a two-dimensional representation of the data that captures most of the variance (assumed to be most of the relevant information), useful to analyze and interpret the structure of a data set.
空间几何视角下
PCA 学习一种线性正交投影,一个旋转 ,使得最大方差的方向和新空间的轴依次对齐。
性质
- PCA 将数据变换为元素之间彼此不相关表示,这可以
消除数据中未知变化因素,即噪音
.
算法流程 Algo Flow
去平均值,即每一位特征减去各自的平均值(当然,为避免量纲以及数据数量级差异带来的影响,先标准化是必要的)
计算协方差矩阵
计算协方差矩阵的特征值与特征向量
对特征值从大到小排序
保留最大的个特征向量
将数据转换到个特征向量构建的新空间中
应用场景 Application Scenarios
- 在手势识别中,将采集到的深度图像进行 PCA,取前三个主成分作为 X, Y, Z 轴方向,即得到手掌的长,宽和厚度方向。参见 Robust 3D Hand Pose Estimation in Single Depth Images: from Single-View CNN to Multi-View CNNs
Ref
- Principal component analysis – Wikipedia
- PCA - Principal Component Analysis : 好图
- 主成分分析PCA : 核心流程
- PCA主成分分析(入门计算+深入解析): 可以,很深入了。后续有 Python 的实现
- 主成分分析(PCA)一次讲个够 : 流程总结很好,excel 计算很有新意