TensorFlow Machine Learning Projects
上QQ阅读APP看书,第一时间看更新

Random forests

Random forests is a technique where you construct multiple trees, and then use those trees to learn the classification and regression models, but the results are aggregated from the trees to produce a final result.

Random forests are an ensemble of random, uncorrelated, and fully-grown decision trees. The decision trees used in the random forest model are fully grown, thus, having low bias and high variance. The trees are uncorrelated in nature, which results in a maximum decrease in the variance. By uncorrelated, we imply that each decision tree in the random forest is given a randomly selected subset of features and a randomly selected subset of the dataset for the selected features.

The original paper describing random forests is available at the following link:  https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf .

The random forest technique does not reduce bias and as a result, has a slightly higher bias as compared to the individual trees in the ensemble. 

Random forests were invented by Leo Breiman and have been trademarked by Leo Breiman and Adele Cutler. More information is available at the following link:  https://www.stat.berkeley.edu/~breiman/RandomForests.

Intuitively, in the random forest model, a large number of decision trees are trained on different samples of data, that either fit or overfit. By averaging the individual decision trees, overfitting cancels out. 

Random forests seem similar to bagging, aka bootstrap aggregating, but they are different. In bagging, a random sample with replacement is selected to train every tree in the ensemble. The tree is trained on all the features. In random forests, the features are also sampled randomly, and at each candidate that is split, a subset of features is used to train the model.

For predicting values in case of regression problems, the random forest model averages the predictions from individual decision trees. For predicting classes in case of a classification problem, the random forest model takes a majority vote from the results of individual decision trees.

An interesting explanation of random forests can be found at the following link:  https://machinelearning-blog.com/2018/02/06/the-random-forest-algorithm/