AdaBoost#
Given a dataset \(\left\{(x_{1}, y_{1}),...,({x_{N},y_{N}})\right\}\) where \(y_{i} \in \left\{-1, 1\right\}\).
Initial sample distribution \(D_{1} = (\frac{1}{N},...,\frac{1}{N}) = (w_{1,1},...,w_{1,N})\)
AdaBoost train \(G_{m}\) on \(D_{m}\), it’s misclassification error is \(e_{m}\), update rule:
\[\begin{split}
\begin{equation}
w_{m+1, i} =
\begin{cases}
\frac{w_{m,i}}{Z_{m}}& \text{correct classified}\\
{\frac{1 - e_{m}}{e_{m}}}\frac{w_{m,i}}{Z_{m}}& \text{misclassified}
\end{cases}
\end{equation}
\end{split}\]
where \(Z_{m}\) is to make \(D_{m+1}\) a valid distribution.
Finally:
\[f(x) = \sum_{m=1}^{M}\log\frac{1 - e_{m}}{e_{m}}G_{m}(x)\]
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
ada_clf = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1),
n_estimators=100,
# SAMME is a multiclass version of AdaBoost
# use SAMME.R if predictor can estimate class probability
algorithm="SAMME.R",
# learning_date grows with n_estimator
learning_rate=0.5)
ada_clf.fit(X_train, y_train)
AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=1),
learning_rate=0.5, n_estimators=100)
from sklearn.metrics import accuracy_score, roc_auc_score
y_pred = ada_clf.predict(X_test)
y_prob = ada_clf.predict_proba(X_test)
accuracy_score(y_test, y_pred), roc_auc_score(y_test, y_prob[:, 1])
(0.912, 0.9732325819672131)
Boosting Tree#
Boosting Tree model:
\[f(x) = \sum_{m=1}^{M}T(x; \theta_{m})\]
For binary classification, loss function (\(\Leftrightarrow\) AdaBoost + DecisionTreeClassifier):
\[L(y, f(x)) = \exp(-yf(x))\]
For regression problem, loss function:
\[L(y, f(x)) = (y - f(x))^2\]
By forward stagewise algorithm:
\[\hat\theta_{m} = \underset{\theta_{m}}{argmin}\sum_{i=1}^{N}L(y_{i}; f_{m-1}(x_{i}) + T(x_{i};\theta_{m}))\]
\[L(y_{i}; f_{m-1}(x_{i}) + T(x_{i};\theta_{m})) = [y_{i} - f_{m-1}(x_{i}) - T(x_{i};\theta_{m})]^{2} = [r_{i} - T(x_{i};\theta_{m})]^{2}\]
So we just need to fit the residual.