Course 3 Regression

Example

Stock market forecast : input data about company to regress an out put like Dow Jones Industrial Average tomorrow

Self-driving Car: input sensor data and out put with direction

Example Application

Estimate a Pokémon’s Combat Power value after evolution

Input: a Pokémon with x_cp shows its combat power before evolution, x_s shows its specie, x_hp shows its strength, x_w shows its weight and x_h shows its height.

Output: y which represents Combat Power after evolution

Step 1: Model

find model from a set of function

suppose we choose a linear model like:
$y=b+w \times x_{cp}$
among which w and b are parameters.

to sum up, we could determine a function :
$y=b+\sum w_i \times x_i$
among which x_i represents an attribute of input x called feature, b called bias and w_i called weight

Step 2: Goodness of Function

using x¹ represents a complete data input individual. y¹ represents a complete output individual.

Collect many of xⁱ and corresponding yⁱ in pairs
$(x^{i},y^{i})$
which can be plotted in a graph.

Machine Learning Notes Course 3

With all training data, we can define the goodness of a function, using a loss function :
Loss function:

Input: A function

Output: Haw bad it is, called Estimated Error

$y=b+w \times x_{cp}$

$L(f)=L(w,b)= \sum_{n=1}^{10}(\hat y^n-(b+w \times x^{n}_{cp}))^2$

Step 3: Best Function

Choose a best function from the set via loss function:
$f^*={arg} min_fL(f)$

$w^*,b^*=argmin_{w,b}L(w,b)$

which means choose the w, b and f that make L(f) and L(w,b) minimum

Using the method: Gradient Descent

Consider loss function L(w) with only one parameter w:

Randomly choose an initial value w⁰

Compute
$\frac {dL}{dw}|_{w=w^0}$
if negative than increase w, else positive decrease w.

next w¹ :
$w^1=w^0 - \eta\frac {dL}{dw}|_{w=w^0}$
Repeat process above until reach a local optimal nut not global optimal.

And about two parameters:

PS: In linear situation, no local optimal in GD method

Generalization

Choose another 10 Pokémon as test data to calculate error

Another Model

$y=b+w_1\times x_{cp}+w_2 \times (x_{cp})^2$

the same method GD is used to calculate a best model

also , some other models like:
$y=b+w_1\times x_{cp}+w_2 \times (x_{cp})^2+w_3 \times (x_{cp})^3$
and using a more complex model may result in a larger error.

Some other factors

Considering Pokémon’s species may have influence, and based on that the model could be redesigned:
Choose different linear function for different species:
E.g. x_s represents specie
$if\quad x_s=Pidgey, \quad y=b_1+w_1 \times x_{cp}$

$if \quad x_s=Weedle,\quad y=b_2+w_2 \times x_{cp}$

and to all above could be summed into a linear function:
Machine Learning Notes Course 3

And more factors like weight, height and some other ones could be taken into consideration which could probably lead to a lower training error but a high testing error since the overfitting could happen.

And to avoid overfitting, the strategy called Regularization could be adapted to the model.
E.g.
$L=\sum_{n}^{}(\hat y^n-(b+\sum w_i \times x_i))^2+\lambda \sum(w_i)^2$
Among that, the part
$\lambda \sum(w_i)^2$
could represent the sensitive level of a function which mainly influenced by the input noise, making the function less influenced by the noise. So the function with a smaller that part could be better, and λ is a parameter.
Larger λ tends to consider the influence of w_i more than the difference between outputs and test data which means considering the training error less.