[机器学习笔记] 梯度下降法

[Machine Learning notebook by NG] Gradient Descent


Linear Regression Model(线性模型)
hθ=θ0+θ1xh_\theta=\theta_0+\theta_1x
J(θ0,θ1)=12mi=1m(hθ(x(i))y(i))2J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^{2}


Gradient Descent algorithm

[机器学习笔记] 梯度下降法
repeat until convergence {
              θj=θjαθjj(θ0,θ1)\theta_j=\theta_j-\alpha\frac{\partial}{\partial\theta_j}j(\theta_0,\theta_1)
              j=0or1j = 0 or 1
}
α\alpha means learning rate,that is to say:it means the size of gradient step


Correct: Simultaneous update
temp0=θ0αθ0j(θ0,θ1)temp0=\theta_0-\alpha\frac{\partial}{\partial\theta_0}j(\theta_0,\theta_1)
temp1=θ0αθ1j(θ0,θ1)temp1=\theta_0-\alpha\frac{\partial}{\partial\theta_1}j(\theta_0,\theta_1)
θ0=temp0\theta_0=temp0
θ1=temp1\theta_1=temp1

Notice that:update θ0\theta_0 and θ1\theta_1 simultaneously

different Start point may obtion result,as than pictures show。
[机器学习笔记] 梯度下降法
[机器学习笔记] 梯度下降法


the size of learning rate α\alpha

if α\alpha is too small,gradient descent can be slow.
if α\alpha is too large, gradient descent can overshoot the minimum it may fail to converge, or even diverge.