机器学习笔记 ---- Logistic Regression

Logistic Regression

1. Problems of Linear Regression When Applied to Classification Problem

1) h(x) may out of range
2) some unusual feature values lead to failure of classification

2. Logistic Regression Model

1)hθ(x)=g(θTx)=P(y=1|x;θ)

where g(z)=11+ez is called Sigmoid Function / Logistic Function
机器学习笔记 ---- Logistic Regression

3. Decision Boundary

y=1 → hθ(x)>0.5θTx>0
y=0 → hθ(x)<0.5θTx<0
decision boundary: hθ(x)=0.5θTx=0 (may be nonlinear)

4. Cost Function

Cost(h(x),y)={log(h(x)),y=1log(1h(x)),y=0=ylog(h(x))(1y)log(1h(x))

J(θ)=1mi=1mCost(h(x(i)),y(i))=1mi=1myTlog(h)(1y)Tlog(1h)

where h=g(Xθ)

5.Iteration Formula

θj:=θjα1mi=1m(h(x(i))y(i))xj(i)

vectorized formula:
θ:=α1mXT(g(Xθ)y)

(identical to linear regression)

6. Some Optimization Algorithms

Conjugate Gradient / BFGS / L-BFGS
No need to pick α and faster, but more complex

7. Multiclass Classification: one-vs-all

Train hθ(i) for every individual i.
When predicting, using maxi(hθ(i)(x))

8.Overfitting Problems

机器学习笔记 ---- Logistic Regression
underfit — high bias — too few features
overfit — high variance — too many features —- fail to predict

2 solutions:
1) Reduce features
2) Regularization: Keep all features while reduce the values of some features

9. Regularization

adding   λ2mj=1nθj2    to    J(θ)
Note that it does not contain θ0 !
λ : regularization parameter, making θ small

10. Regularized Linear Regression

(1) Linear Regression

J(θ)=12m(i=1m(h(x(i))y(i))2+λj=1nθj2)
Note that it does not contain θ0 !

θj:=θjα(1mi=1m(h(x(i))y(i))xj(i)+λmθj) for j0
which is also
θj:=θj(1αλm)α1mi=1m(h(x(i))y(i))xj(i)+ for j0

(2) Normal Equation

θ=(XTX+λdiag(0,1,1,...1,1))1XTy where size of diag() is (n+1)*(n+1)