[Machine Learning notebook by NG] Gradient Descent
Linear Regression Model(线性模型)
hθ=θ0+θ1x
J(θ0,θ1)=2m1∑i=1m(hθ(x(i))−y(i))2
Gradient Descent algorithm
repeat until convergence {
θj=θj−α∂θj∂j(θ0,θ1)
j=0or1
}
α means learning rate,that is to say:it means the size of gradient step
Correct: Simultaneous update
temp0=θ0−α∂θ0∂j(θ0,θ1)
temp1=θ0−α∂θ1∂j(θ0,θ1)
θ0=temp0
θ1=temp1
Notice that:update θ0 and θ1 simultaneously
different Start point may obtion result,as than pictures show。
the size of learning rate α
if α is too small,gradient descent can be slow.
if α is too large, gradient descent can overshoot the minimum it may fail to converge, or even diverge.