Random Forest#

Trees in random forest are trained on different random subsets.

Sampling with replacement is called bagging (short for bootstrap aggregating), random forest uses bagging to form random subsets.

Random Forest introduces extra randomness when growing trees, that is, instead of searching for the very best feature, it search for the best among a random subset of features.

\[\mbox{Random Forest = Random Subset + Random Split Candidates}\]

Exmaples#

from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier()
clf.fit(X_train, y_train)
RandomForestClassifier()
from sklearn.metrics import accuracy_score

accuracy_score(y_test, clf.predict(X_test))
0.88