Machine Learning(2)Estimate the probability density -- Mixture of Gaussians
Machine Learning(2)Mixture of Gaussians
Chenjing Ding
2018/02/21
notation | meaning |
---|---|
M | the number of mixture components |
p(j) | weight of mixture component |
mixture component | |
mixture density | |
j-th component parameters |
1. Mixture of Multivariate Gaussians
In some cases, one Gaussian distribution cannot represent , (see red model in figure 1 ), thus in this chapter we want to estimate the mixture density of multivariate Gaussians.
1.1 Obtain mixture of density
Weight of mixture component:
Mixture component:
Mixture density
figure1 mixture of density
2. Maximum Likelihood
using maximum likelihood to estimate :
Problem with estimation
depends on , also depends on , so there is no analytical solution.
3. K-Means cluster
K-Means cluster aims to assign data to one of the K clusters according to the distance to the mean of each cluster.
3.1 steps
step1: Initialization: pick K arbitrary centroids (cluster means)
step2: Assign each sample to the closest centroid.
step3: Adjust the centroids to be the means of the samples assigned to them.
step4: Go to step 2 until no change in step3;
figure2 the process of K-Means cluster (K = 2)
3.2 Objective function
K-Means optimizes the following objective function:
3.3 Advantages and Disadvantages
Advantage:
- simple and fast to compute
- converge to local minimum of within-cluster squared error
Disadvantage:
- sensitive to initialization
- sensitive to outliers
- difficult to set K properly
- only detect spherical clusters
figure3 the problem of K-Means cluster (K = 2)
4 .EM Algorithm
Once we use K-Means cluster to get the mean of each cluster, then we have , we can estimate the “responsibility” of component j for mixture density .
4.1 K-Means Clustering Revisited
step1: Initialization pick K arbitrary centroids [compute ]
step2: Assign each sample to the closest centroid. [compute Estep]
step3: Adjust the centroids to be the means of the samples assigned to them, [compute Mstep]
step4: Go to step 2 (until no change)
The process is almost same with K-Means cluster, but in K-Means one point only depends on one distribution, no concept like .
4.2 Estep & Mstep
Estep: softly assign samples to mixture components
Mstep: re-estimate the parameters (separately for each mixture component) based on the soft assignments.
4.3 Advantages
- Very general, can represent any (continuous) distribution.
- Once trained, very fast to evaluate.
- Can be updated online.
4.4 Caveats
-
introduce regularization
instead of , use to avoid causing goes to infinite -
Initialize with k-Means to get better results
Typical steps:
Run k-Means M times (e.g. M = 10~100)
Pick the best result (lowest error J)
Use this result to initialize EM - EM for MoG is computational expensive
- Need to select the number of mixture components K properly model selection problem