Logistic Regression

1. Problems of Linear Regression When Applied to Classification Problem

1) h(x) may out of range
2) some unusual feature values lead to failure of classification

2. Logistic Regression Model

$1) h_{θ} (x) = g (θ^{T} x) = P (y = 1 | x; θ)$

where $g (z) = \frac{1}{1 + e^{- z}}$ is called Sigmoid Function / Logistic Function
机器学习笔记 ---- Logistic Regression

3. Decision Boundary

y=1 → $h_{θ} (x) > 0.5$ → $θ^{T} x > 0$
y=0 → $h_{θ} (x) < 0.5$ → $θ^{T} x < 0$
decision boundary: $h_{θ} (x) = 0.5$ → $θ^{T} x = 0$ (may be nonlinear)

4. Cost Function

C o s t (h (x), y) = {\begin{cases} - l o g (h (x)), & y = 1 \\ - l o g (1 - h (x)), & y = 0 \end{cases} = - y l o g (h (x)) - (1 - y) l o g (1 - h (x))

J (θ) = \frac{1}{m} \sum_{i = 1}^{m} C o s t (h (x^{(i)}), y^{(i)}) = \frac{1}{m} \sum_{i = 1}^{m} - y^{T} l o g (h) - (1 - y)^{T} l o g (1 - h)

where

h = g (X θ)

5.Iteration Formula

θ_{j} := θ_{j} - α * \frac{1}{m} \sum_{i = 1}^{m} (h (x^{(i)}) - y^{(i)}) * x_{j}^{(i)}

vectorized formula:

θ := α * \frac{1}{m} X^{T} (g (X θ) - y)

(identical to linear regression)

6. Some Optimization Algorithms

Conjugate Gradient / BFGS / L-BFGS
No need to pick α and faster, but more complex

7. Multiclass Classification: one-vs-all

Train $h_{θ}^{(i)}$ for every individual i.
When predicting, using $m a x_{i} (h_{θ}^{(i)} (x))$

8.Overfitting Problems

机器学习笔记 ---- Logistic Regression
underfit — high bias — too few features
overfit — high variance — too many features —- fail to predict

2 solutions:
1) Reduce features
2) Regularization: Keep all features while reduce the values of some features

9. Regularization

adding $\frac{λ}{2 m} \sum_{j = 1}^{n} θ_{j}^{2}$ to $J (θ)$
Note that it does not contain $θ_{0}$ !
$λ$ : regularization parameter, making $θ$ small

10. Regularized Linear Regression

(1) Linear Regression

$J (θ) = \frac{1}{2 m} (\sum_{i = 1}^{m} (h (x^{(i)}) - y^{(i)})^{2} + λ \sum_{j = 1}^{n} θ_{j}^{2})$
Note that it does not contain $θ_{0}$ !

$θ_{j} := θ_{j} - α (\frac{1}{m} \sum_{i = 1}^{m} (h (x^{(i)}) - y^{(i)}) * x_{j}^{(i)} + \frac{λ}{m} θ_{j})$ for $j \neq 0$
which is also
$θ_{j} := θ_{j} (1 - α \frac{λ}{m}) - α * \frac{1}{m} \sum_{i = 1}^{m} (h (x^{(i)}) - y^{(i)}) * x_{j}^{(i)} +$ for $j \neq 0$

(2) Normal Equation

$θ = (X^{T} X + λ d i a g (0, 1, 1, . . .1, 1))^{- 1} X^{T} y$ where size of diag() is (n+1)*(n+1)

机器学习笔记 ---- Logistic Regression