[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

1.1 Setting up your Machine Learning Application

Train/Dev/Test sets

It depends on the data. (98/1/1)

worst:

Train set error: 50%(high bias) underfitting

Dev set error: 50%(high variance) overfitting

basic recipe for ml

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

Now we don't need to balance bias and variance.


1.2 Regularizing your neural network

high bias: regularization/add training data

L2 may lead over fitness

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

L2 norm: "weight decay"

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

For each iteration, w minus a small percent of w. This seems like gradient descent.

So why regularization can efficiently reduce overfitting.

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

In a word, regularization increases the "linear" percent of activating function(tanh/sigmoid)

regularization: w is a linear model.

dropout regularization: seems like a highly risky operation

initialize keep_prob(a random 0/1-matrix)

For each layer, we set a dropout-f, and if necessary we can set a key to controlling whether user fropout=f.

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

Other regularization methods

data augmentation: horizontal, vertically, rotation

early stopping: avoid overfitting but stop minimize cost function which increases the bias


1.3 Setting up your optimization problem

normalizing inputs: here is a photo showing the differences after normalizing inputs

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

deep network: Vanishing / Exploding gradient

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

weight initialization for deep networks 

let w have a default number

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)


check Gradient

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

wrong:

2.The dev and test set should: Come from the same distribution

6. Increase the regularization hyperparameter lambda: Weights are pushed toward becoming smaller (closer to 0)

question

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)

[coursera/ImprovingDL/week1]Practical aspects of Deep Learning(summary&question)