AdaBoost#

Given a dataset \(\left\{(x_{1}, y_{1}),...,({x_{N},y_{N}})\right\}\) where \(y_{i} \in \left\{-1, 1\right\}\).

Initial sample distribution \(D_{1} = (\frac{1}{N},...,\frac{1}{N}) = (w_{1,1},...,w_{1,N})\)

AdaBoost train \(G_{m}\) on \(D_{m}\), it’s misclassification error is \(e_{m}\), update rule:

\[\begin{split} \begin{equation} w_{m+1, i} = \begin{cases} \frac{w_{m,i}}{Z_{m}}& \text{correct classified}\\ {\frac{1 - e_{m}}{e_{m}}}\frac{w_{m,i}}{Z_{m}}& \text{misclassified} \end{cases} \end{equation} \end{split}\]

where \(Z_{m}\) is to make \(D_{m+1}\) a valid distribution.

Finally:

\[f(x) = \sum_{m=1}^{M}\log\frac{1 - e_{m}}{e_{m}}G_{m}(x)\]
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier

ada_clf = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), 
                             n_estimators=100, 
                             # SAMME is a multiclass version of AdaBoost
                             # use SAMME.R if predictor can estimate class probability
                             algorithm="SAMME.R",
                             # learning_date grows with n_estimator
                             learning_rate=0.5)
ada_clf.fit(X_train, y_train)
AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=1),
                   learning_rate=0.5, n_estimators=100)
from sklearn.metrics import accuracy_score, roc_auc_score

y_pred = ada_clf.predict(X_test)
y_prob = ada_clf.predict_proba(X_test)

accuracy_score(y_test, y_pred), roc_auc_score(y_test, y_prob[:, 1])
(0.912, 0.9732325819672131)

Boosting Tree#

Boosting Tree model:

\[f(x) = \sum_{m=1}^{M}T(x; \theta_{m})\]

For binary classification, loss function (\(\Leftrightarrow\) AdaBoost + DecisionTreeClassifier):

\[L(y, f(x)) = \exp(-yf(x))\]

For regression problem, loss function:

\[L(y, f(x)) = (y - f(x))^2\]

By forward stagewise algorithm:

\[\hat\theta_{m} = \underset{\theta_{m}}{argmin}\sum_{i=1}^{N}L(y_{i}; f_{m-1}(x_{i}) + T(x_{i};\theta_{m}))\]
\[L(y_{i}; f_{m-1}(x_{i}) + T(x_{i};\theta_{m})) = [y_{i} - f_{m-1}(x_{i}) - T(x_{i};\theta_{m})]^{2} = [r_{i} - T(x_{i};\theta_{m})]^{2}\]

So we just need to fit the residual.