chapter 27 Multiple features

start to talk about a new version of linear regression,more powerful one that works with multiple variables or with multiple features.

Machine Learning Wu Enda3

Notation:

n = number of features

$x^{(i)}$ = input (features) of i th training example.

$x_{j}^{(i)}$ = value of feature j in i th training example.

Hypothesis:

$h_{θ} (x) = θ_{0} + θ_{1} x_{1} + θ_{2} x_{2} + . . . + θ_{n} x_{n}$

for convenience of notation ,define $x_{0}$ = 1.

x = [\begin{matrix} x_{0} \\ x_{1} \\ x_{2} \\ ⋮ \\ x_{n} \end{matrix}] \in R^{n + 1}

θ = [\begin{matrix} θ_{0} \\ θ_{1} \\ θ_{2} \\ ⋮ \\ θ_{n} \end{matrix}] \in R^{n + 1}

$h_{θ} (x) = θ^{T} x$

Multivariate linear regression.

chapter 28 Gradient descent for multiple variables

how to fit the parameters of that hypothesis.how to use gradient descent for linear regression with multiple features

Hypothesis: $h_{θ} (x) = θ^{T} x = θ_{0} x_{0} + θ_{1} x_{1} + . . . + θ_{n} x_{n})$

Parameters: $θ$ here is a n+1-dimensional vector.

Cost function:

$J (θ) = \frac{1}{2 m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)}) - y^{(i)})$

Gradient descent:

Repeat{

$θ_{j} := θ_{j} - a \frac{\partial}{\partial θ_{j}} J (θ)$

} (simultaneously update for every $j = 0, . . ., n$ )

New algorithm(n>=1):

Repeat{

$θ_{j} := θ_{j} - a \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)}) - y^{(i)})$

}

$θ_{0} := θ_{0} - a \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)}) - y^{(i)}) x_{0}^{(i)}$

$θ_{1} := θ_{1} - a \frac{1}{m} \sum_{i = 1}^{m} (h_{θ} (x^{(i)}) - y^{(i)}) x_{1}^{(i)}$

chapter 29 Gradient descent in practice 1:feature scaling

practical tricks for making gradient descent work well

Feature Scaling:

Idea: Make sure features are on similar scale.

Machine Learning Wu Enda3

Get every feature into approximately a $- 1 <= x_{i} <= 1$ range.

Mean normalization

Replace $x_{i}$ with $x_{i} - u_{i}$ to make features have approximately zero mean (Do not apply to $x_{0} = 1$ )

$x_{1} \leftarrow \frac{x_{1} - u_{1}}{s_{1}}$ $u_{1}$ is the average value of x1 in the training sets.

$s_{1}$ is the range of values of that feature or standard deviation

chapter 30 Gradient descent in practice 2:learning rate

around the learning rate $a$

“Debugging “:How to make sure gradient descent is working correctly.
How to choose learning rate $a$

Machine Learning Wu Enda3

Declare convergence if $J (θ)$ decreases by less than $10^{- 3}$ in one iteration.

but choose what this threshold is pretty difficult.So,in order to check your gradient descent has converged, actually tend to look at plots.

Machine Learning Wu Enda3

For sufficiently small $a$ , $J (θ)$ should decrease on every iteration.
But if $a$ is too small,gradient descent can be slow to converge.
if $a$ is too large ; $J (θ)$ may not decrease on every iteration ;may not converge.

chapter 31 Features and polynomial regression

the choice of features that you have and how you can get different learning algorithm

Machine Learning Wu Enda3

It is important to apply feature scaling if you’re using gradient descent to get them into comparable ranges of values.

broad choices in the features you use.

Machine Learning Wu Enda3

chapter 32 Normal equation

which for some linear regression problems ,will give us a much better way to solve for the optimal value of the parameters $θ$

Normal equation : Method to solve for $θ$ analytically.

Machine Learning Wu Enda3

正规方程推到过程：https://zhuanlan.zhihu.com/p/22474562

矩阵求导：https://blog.****.net/nomadlx53/article/details/50849941

Gradient Descent and Normal Equation advantages and disadvantages :

Machine Learning Wu Enda3

The normal equation method actually do not work for those more sophisticated learning algorithms.

chapter 33 Normal equation and non-invertibility (optional)

what if $X^{T} X$ is non-invertible?

Redundant features (linearly dependent)

e g : $x_{1}$ = size in feet $^{2}$ $x_{2}$ = size in m $^{2}$

too many features (e.g m<=n)

Delete some features , or use regularization.

Machine Learning Wu Enda3

chapter 27 Multiple features

chapter 28 Gradient descent for multiple variables

chapter 29 Gradient descent in practice 1:feature scaling

chapter 30 Gradient descent in practice 2:learning rate

chapter 31 Features and polynomial regression

chapter 32 Normal equation

chapter 33 Normal equation and non-invertibility (optional)

相关推荐