Stacking
Stacking
Basic Idea
The basic idea behind stacked generalization is to use a pool of base classifiers, then use another classifier to combine their predictions, with the aim of reducing the generalization error.
Algorithms
Standard Stacking
- Split training set into folds , fit learner on each folds and use it to predict the remaining fold to get predictions on , denoted as , one fold at a time.
- Meanwhile, each time we get a , use it to make predictions on the whole test set thus we obtain sets of predictions .
-
or other averaging methods - Repeat above steps from to , using different learners which we call unified as Model 1, we have
- Consider and as new training set and test set respectively, train and make predictions with Model 2 (usually Logistic Regression, Popular non-linear algorithms for stacking are GBM, KNN, NN ,RF**and **ET (extra trees).) to get final results.
In a word, it models
where is the weight of the i-th learner and is the corresponding prediction. Some details should be paid attention to:
- Averaging works if it is a regression problem or it is a classification problem but model 1 learners’ output are probabilities. Under other circumstances, voting could be better than averaging.
- In fact you can also get by simply using to train on the whole training set and make predictions on the test set , which may consume more computing resources but slightly lower the coding complexity.
- The partitions of training set must be the same for different estimators, especially when you’re on a team work, or it will lead to information leak and therefore over-fitting.
Feature-Weighted Linear Stacking
Replace with , here represents the k-th feature of a sample and is the corresponding weight.
Further
We can also insert predictions of Model 1 to original features to get expanded features, notice that their dimensions are different, therefore normalization is necessary.
References
- Stacked Generalization
http://www.machine-learning.martinsewell.com/ensembles/stacking/Wolpert1992.pdf - Feature-Weighted Linear Stacking
https://arxiv.org/pdf/0911.0460.pdf - Kaggle Ensemble Guide
https://mlwave.com/kaggle-ensembling-guide/ - A Kaggler’s Guide to Model Stacking in Practice
http://blog.kaggle.com/2016/12/27/a-kagglers-guide-to-model-stacking-in-practice/ - 详解stacking过程
https://blog.csdn.net/wstcjf/article/details/77989963 - Stacking Learning在分类问题中的使用 (with more valuable links)
https://blog.csdn.net/MrLevo520/article/details/78161590 - 详解 Stacking 的 python 实现
https://www.jianshu.com/p/5905f19c4df6