Algorithm: Boosting model with XGBoost

Difference between bagging and boosting:

Algorithm: Boosting model with XGBoost

We call each sub model in ensemble mode as weak learner. In random forest, it is the decision tree.

Weak Learner: it can't be used to predict the result indepedently.

Overfitting: the model will predict good in the training data but weak in the test model.

Underfitting: the model can't fit the training data well.

We considered that the bagging model is combined with a lot of "professor". We use a lot of "professor" to prevent from overfitting.

Boosting is combined with a lot of "slacker student". We use a lot of "slacker student" to improve from underfitting.

 

Algorithm: Boosting model with XGBoost

 

How to train a boosting model? 

Residual

We can train the bagging model in parallel, while train the boosting model in serial by residual.

Algorithm: Boosting model with XGBoost

Algorithm: Boosting model with XGBoost

We use a weak learning model to predict ,and calculate the residual value.

We base on the residual value and train the weak learning model 2.

Algorithm: Boosting model with XGBoost

and then we calculate the residual value respect with model 2.

Then we train the model 3 and do the same work.

Algorithm: Boosting model with XGBoost

Once we have trained a bunch of weak learners,we sum all of the predictions from each of the model to get the final prediction

Algorithm: Boosting model with XGBoost

Reference of XGBoost

https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf

 

XGBoost Model

Algorithm: Boosting model with XGBoost

How to understand XGBoost? 

Algorithm: Boosting model with XGBoost

1)How to build the target function?

2)Approximate of the target function? Taylor Expansion

3)How to obtain the target function respect to the Tree datastructure, parameterize of the tree

4)How to optimize the target function? Greedy Algorithm

 

The target function

Algorithm: Boosting model with XGBoost

Once we have a trained model,we sum all of the predictions from each of the sub model to get the final prediction

Algorithm: Boosting model with XGBoost

fk(Xi) means the prediction of the k-th sub tree respect to the i-th sample of the input data.

The lost(error) function

Algorithm: Boosting model with XGBoost

The target function is combined with the lost function and regulization term

For regression problem we can use the mean squared error(MSE)

For classification problem we can use the cross entropy.

For regulization we can use L1, L2, or elastic net.

How to define the complexity of the XGBoost?

the depth, the number of leaves, and the predict value of each leaf node..

Algorithm: Boosting model with XGBoost

If we have trained the K-1 sub models, how to obtain the k-th one?

Additive Training(叠加式训练)

Algorithm: Boosting model with XGBoost

Taylor Expansion with object function

Algorithm: Boosting model with XGBoost

We get the new object function,but we still can‘t parameterize fk(xi)’ and the complexity of the tree

 

Parameterize of the tree

Algorithm: Boosting model with XGBoost

Parameterize the complexity of the tree(regulization term)

Algorithm: Boosting model with XGBoost

New Object function

Algorithm: Boosting model with XGBoost

We obtain the final object function and we need to optimize it

Optimization

It is an optimization problem for the structure of the tree, and it is equal to a search problem

we can use brute-force, greedy algorithm etc

But the brute-force has the complexity of exponential

Algorithm: Boosting model with XGBoost

Greedy Approach similary to decision tree

Algorithm: Boosting model with XGBoost

When we build the decision tree we using the entropy or standard deviation to best feature selection

When build the xgboost tree we using our object function to choose the best decision boundary