XGBoost#

Ensemble tree model:

\[\hat{y}_{i} = \sum_{k=1}^{K}f_{k}(x_{i}), f_{k}\in\mathcal{F}\]

The objective function:

\[\mbox{obj} = \sum_{i=1}^{n}l(y_{i}, \hat{y}_{i}) + \sum_{k=1}^{K}\Omega(f_{k})\]

where \(l\) is the loss term, \(\Omega\) is the regularization term.

Second Order Objective#

Step by step:

\[\begin{split} \begin{equation} \begin{split} \mbox{obj}^{(t)} =& \sum_{i=1}^{n}l(y_{i}, \hat{y}_{i}^{(t)}) + \sum_{k=1}^{t}\Omega(f_{k}) \\ =& \sum_{i=1}^{n}l(y_{i}, \hat{y}_{i}^{(t - 1)} + f_{t}(x_{i})) + \Omega(f_{t}) + \mbox{constant} \end{split} \end{equation} \end{split}\]

Take the taylor expansion up to the second order:

\[\mbox{obj}^{(t)} \approx \sum_{i=1}^{n}[l(y_{i}, \hat{y}_{i}^{(t - 1)}) + g_{i}f_{t}(x_{i}) + \frac{1}{2}h_{i}f_{t}(x_{i})^{2}] + \Omega(f_{t}) + \mbox{constant}\]
\[g_{i} = \frac{\partial l(y_{i}, \hat{y}_{i}^{(t - 1)})}{\partial \hat{y}_{i}^{(t-1)}}\]
\[h_{i} = \frac{\partial^{2} l(y_{i}, \hat{y}_{i}^{(t - 1)})}{{\partial \hat{y}_{i}^{(t-1)}}^2}\]

After removing all the constants, the specific objective at step \(t\) becomes:

\[\sum_{i=1}^{n}[g_{i}f_{t}(x_{i}) + \frac{1}{2}h_{i}f_{t}(x_{i})^{2}] + \Omega(f_{t})\]

Optimization Given Tree Structure#

We can define the complexity as ( \(w_{j}\) is the value of the \(j\)-th leaf ):

\[\Omega(f_{t}) = \gamma{T} + \frac{1}{2}\lambda\sum_{j=1}^{T}w_{j}^{2}\]

Rearrange the loss function:

\[\begin{split} \begin{equation} \begin{split} \mbox{obj}^{(t)} = & \sum_{i=1}^{n}[g_{i}f_{t}(x_{i}) + \frac{1}{2}h_{i}f_{t}(x_{i})^{2}] + \gamma{T} + \frac{1}{2}\lambda\sum_{j=1}^{T}w_{j}^{2} \\ =& \sum_{j=1}^{T}[(\sum_{i\in{I_{j}}}g_{i})w_{j} + \frac{1}{2}(\sum_{i\in{I_j}}h_{i} + \lambda)w_{j}^2] + \gamma{T} \end{split} \end{equation} \end{split}\]

where \(I_{j} = \{i|q_{i}=j\}\) is the set of indices of data-points assign to the \(j\)-th leaf.

Denote \(G_{j} = \sum_{i\in{I_{j}}}g_{i}, H_{j} = \sum_{i\in{I_j}}h_{i}\):

\[\mbox{obj}^{(t)} = \sum_{j=1}^{T}[G_{j}w_{j} + \frac{1}{2}(H_{j} + \lambda)w_{j}^2] + \gamma{T}\]

Quadratic optimization result:

\[w_{j}^{\ast} = -\frac{G_{j}}{H_{j} + \lambda}\]
\[\mbox{obj}^{\ast} = -\frac{1}{2}\sum_{i=1}^{T}\frac{G_{j}^2}{H_{j} + \lambda} + \gamma{T}\]

Learn Tree Structure#

When we try to split a leaf into two leaves, the score gain is:

\[\mbox{Gain} = \frac{1}{2}\left [\frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_R + \lambda} - \frac{(G_L + G_R)^2}{H_L + H_R + \lambda}\right ] - \gamma\]

Select split that maximize that Gain, thus form tree structure step-by-step.

Examples#

"""quadratic dataset"""
import numpy as np

np.random.seed(42)
X = np.random.rand(100, 1) - 0.5
y = 3 * X[:, 0] ** 2 + 0.05 * np.random.randn(100)
"""basic xgboost"""
import xgboost

reg = xgboost.XGBRegressor()
reg.fit(X, y)
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=16, num_parallel_tree=1, random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)