Algorithm: Boosting model with XGBoost
Difference between bagging and boosting:
We call each sub model in ensemble mode as weak learner. In random forest, it is the decision tree.
Weak Learner: it can't be used to predict the result indepedently.
Overfitting: the model will predict good in the training data but weak in the test model.
Underfitting: the model can't fit the training data well.
We considered that the bagging model is combined with a lot of "professor". We use a lot of "professor" to prevent from overfitting.
Boosting is combined with a lot of "slacker student". We use a lot of "slacker student" to improve from underfitting.
How to train a boosting model?
Residual
We can train the bagging model in parallel, while train the boosting model in serial by residual.
We use a weak learning model to predict ,and calculate the residual value.
We base on the residual value and train the weak learning model 2.
and then we calculate the residual value respect with model 2.
Then we train the model 3 and do the same work.
Once we have trained a bunch of weak learners,we sum all of the predictions from each of the model to get the final prediction
Reference of XGBoost
https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf
XGBoost Model
How to understand XGBoost?
1)How to build the target function?
2)Approximate of the target function? Taylor Expansion
3)How to obtain the target function respect to the Tree datastructure, parameterize of the tree
4)How to optimize the target function? Greedy Algorithm
The target function
Once we have a trained model,we sum all of the predictions from each of the sub model to get the final prediction
fk(Xi) means the prediction of the k-th sub tree respect to the i-th sample of the input data.
The lost(error) function
The target function is combined with the lost function and regulization term
For regression problem we can use the mean squared error(MSE)
For classification problem we can use the cross entropy.
For regulization we can use L1, L2, or elastic net.
How to define the complexity of the XGBoost?
the depth, the number of leaves, and the predict value of each leaf node..
If we have trained the K-1 sub models, how to obtain the k-th one?
Additive Training(叠加式训练)
Taylor Expansion with object function
We get the new object function,but we still can‘t parameterize fk(xi)’ and the complexity of the tree
Parameterize of the tree
Parameterize the complexity of the tree(regulization term)
New Object function
We obtain the final object function and we need to optimize it
Optimization
It is an optimization problem for the structure of the tree, and it is equal to a search problem
we can use brute-force, greedy algorithm etc
But the brute-force has the complexity of exponential
Greedy Approach similary to decision tree
When we build the decision tree we using the entropy or standard deviation to best feature selection
When build the xgboost tree we using our object function to choose the best decision boundary