LightGBM

LightGBM#

LightGBM is a highly efficient gradient boosting decision tree.

XGboost uses pre-sort to find the best split, it’s complexity is approximately:

\[\mbox{Complexity} = \mbox{num_features} * \mbox{num_split_per_feature} * \mbox{num_samples}\]

LightGBM optimize all these three factors.

Histogram#

Instead of finding the split points on the sorted feature values, histogram-based algorithm buckets continuous feature values into discrete bins and uses these bins to construct feature histograms during training.

Histogram reduces \(\mbox{num_split_per_feature}\).

In computing one node’s histogram, we have the diff trick:

\[\mbox{Histogram} = \mbox{Histogram of it's father} - \mbox{Histogram of it's brother}\]

GOSS#

GOSS stands for Gradient-based One-Side Sampling.

GOSS keeps all the instances with large gradients and performs random sampling on the instances with small gradients.

Specifically, GOSS firstly sorts the data instances according to the absolute value of their gradients and selects the top \(a\times{100\%}\) instances.

Then it randomly samples \(b\times{100\%}\) instances from the rest of the data, after that amplifiy these data by a constant \(\frac{1 - a}{b}\).

GOSS reduces \(\mbox{num_samples}\).

EFB#

EFB stands for Exclusive Feature Bundling.

In a sparse feature space, many features are matually exclusive, i.e., they never take nonzero value simultaneously, We can safely bundle exclusive features into a single feature

To determine which features should be bundled together:

Reduce the optimal bundling problem to the graph coloring problem by taking features as vertices and adding edges for every two features if they are not mutually exclusive.
Use a greedy algorithm which can produce reasonably good results.

To construct the buddle, we can add offsets to the original values of the features to make sure the bundle is invertible. For example, suppose we have two features, feature A takes value from \(\left[0, 10\right)\) and feature B takes value \(\left[0, 20\right)\). We then add an offset of 10 to the values of feature B so that the refined feature takes values from \(\left[0, 30\right)\).

EFB reduces \(\mbox{num_features}\).

Leaf-Wise#

XGBoost uses level-wise tree growth, while LightGBM uses leaf-wise tree growth.

Examples#

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

iris = load_iris()
X_train, X_test, y_train, y_test =train_test_split(iris.data, iris.target, test_size=0.2)

import lightgbm as lgb

gbm = lgb.LGBMRegressor(objective="regression",
                        num_leaves=25,
                        learning_rate=0.05,
                        n_estimators=20)

gbm.fit(X_train, y_train, 
        eval_set=[(X_test, y_test)], 
        eval_metric='l2', 
        early_stopping_rounds=10)

[1]	valid_0's l2: 0.680646
Training until validation scores don't improve for 10 rounds
[2]	valid_0's l2: 0.623751
[3]	valid_0's l2: 0.572918
[4]	valid_0's l2: 0.526781
[5]	valid_0's l2: 0.485581
[6]	valid_0's l2: 0.448184
[7]	valid_0's l2: 0.414807
[8]	valid_0's l2: 0.384515
[9]	valid_0's l2: 0.357492
[10]	valid_0's l2: 0.33297
[11]	valid_0's l2: 0.311105
[12]	valid_0's l2: 0.29222
[13]	valid_0's l2: 0.275212
[14]	valid_0's l2: 0.259897
[15]	valid_0's l2: 0.246108
[16]	valid_0's l2: 0.233694
[17]	valid_0's l2: 0.222521
[18]	valid_0's l2: 0.212465
[19]	valid_0's l2: 0.202953
[20]	valid_0's l2: 0.194433
Did not meet early stopping. Best iteration is:
[20]	valid_0's l2: 0.194433

LGBMRegressor(learning_rate=0.05, n_estimators=20, num_leaves=25,
              objective='regression')

from sklearn.metrics import mean_squared_error

y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration_)
mean_squared_error(y_test, y_pred)

0.19443314290792704