机器学习实战(基于scikit-learn和TensorFlow)学习心得(19)--Ridge Regression
如何避免回归方程overfit,有很多种方法,但比较常见而且容易实施的方法就是约束回归方程参数的取值范围.Ridge Regression, Lasso Regression, and Elastic Net就是三种约束每个feature占比的方法.(这里说的占比指的是y=0.1*x1+2*x2,这个方程中明显看到feature x2占比大(前面的系数大)).
1. Ridge Regression
ridge regression就是在线性回归的cost function(MSE)中加入一个正则项,我们使用这个方法使model在训练training set的时候会keep the weight as small as possible.但我们使用这个方法时要注意当我们训练完这个模型去评价这个模型好不好用的时候应该去掉正则项,这样才能看到在这组参数作用下系统的性能具体如何.
这段我翻译的不好,上个原文:
Ridge Regression (also called Tikhonov regularization) is a regularized version of Linear Regression: a regularization term equal to is added to the cost function. This forces the learning algorithm to not only fit the data but also keep the model weights as small as possible. Note that the regularization term should only be added to the cost function during training. Once the model is trained, you want to use the unregularized performance measure to evaluate the model’s performance.
而控制a的数值可以改变我们的约需力度(a是一个超参数),当a=0的时候就是简单的线性回归.但当a很大的时候,为了使cost function最小,系统会不择手段的减小theta的值,这样就会使每一个theta的值都超小,最终使线性回归趋近于一条平行于x轴的直线. 为什么是平行于x轴的直线而不是x轴呢?
可以看到后面的累加是从i=1开始的,也就是说对θ0没有约束,使得最终的直线平行于x轴.
从上图中可以看到,多项式回归(degree=10)也可以用这种办法约束,但是随着a的增大,多项式会变得越来越平坦,而且会增加bias减小varidance.什么叫做bias?和varidance?
Bias
This part of the generalization error is due to wrong assumptions, such as assuming that the data is linear when it is actually quadratic. A high-bias model is most likely to underfit the training data.
Variance
This part is due to the model’s excessive sensitivity to small variations in the training data. A model with many degrees of freedom (such as a high-degree polynomial model) is likely to have high variance and thus overfit the training data.
Irreducible error
This part is due to the noisiness of the data itself. The only way to reduce this part of the error is to clean up the data (e.g., fix the data sources, such as broken sensors, or detect and remove outliers).
那么如何计算多元线性回归的优化值呢?
其中A是单位矩阵.