Machine Learning Notes Course 3
Course 3 Regression
Example
Stock market forecast : input data about company to regress an out put like Dow Jones Industrial Average tomorrow
Self-driving Car: input sensor data and out put with direction
Example Application
Estimate a Pokémon’s Combat Power value after evolution
Input: a Pokémon with xcp shows its combat power before evolution, xs shows its specie, xhp shows its strength, xw shows its weight and xh shows its height.
Output: y which represents Combat Power after evolution
Step 1: Model
find model from a set of function
suppose we choose a linear model like:
among which w and b are parameters.
to sum up, we could determine a function :
among which xi represents an attribute of input x called feature, b called bias and wi called weight
Step 2: Goodness of Function
using x1 represents a complete data input individual. y1 represents a complete output individual.
Collect many of xi and corresponding yi in pairs
which can be plotted in a graph.
With all training data, we can define the goodness of a function, using a loss function :
Loss function:
Input: A function
Output: Haw bad it is, called Estimated Error
Step 3: Best Function
Choose a best function from the set via loss function:
which means choose the w, b and f that make L(f) and L(w,b) minimum
Using the method: Gradient Descent
Consider loss function L(w) with only one parameter w:
Randomly choose an initial value w0
Compute
if negative than increase w, else positive decrease w.
next w1 :
Repeat process above until reach a local optimal nut not global optimal.And about two parameters:
PS: In linear situation, no local optimal in GD method
Generalization
Choose another 10 Pokémon as test data to calculate error
Another Model
the same method GD is used to calculate a best model
also , some other models like:
and using a more complex model may result in a larger error.
Some other factors
Considering Pokémon’s species may have influence, and based on that the model could be redesigned:
Choose different linear function for different species:
E.g. xs represents specie
and to all above could be summed into a linear function:
And more factors like weight, height and some other ones could be taken into consideration which could probably lead to a lower training error but a high testing error since the overfitting could happen.
And to avoid overfitting, the strategy called Regularization could be adapted to the model.
E.g.
Among that, the part
could represent the sensitive level of a function which mainly influenced by the input noise, making the function less influenced by the noise. So the function with a smaller that part could be better, and λ is a parameter.
Larger λ tends to consider the influence of wi more than the difference between outputs and test data which means considering the training error less.