Ensemble learning combines multiple weak learners to build a strong learner.
Majority voting is generally used in classification tasks: take the
majority of the model predictions as the final prediction.
average of predictions from all the models as the final prediction, in
classification (the average of probabilities) tasks.
different weights defining the importance of each model for different models.
Stacking (a.k.a. Stacked Generalization, or Super Learner) employs a number of first-layer individual learners (model 1-5 / Tier-1 in the below figures) generated from the training data set, followed by a second-level learner (model 6 / Tier-2, a.k.a. meta-learner).
Similar to stacking, but use only a devset from the training set to make predictions. The devset and the predictions are used to build the model on test set.
- Bootstrapped subsampling;
- Fit the base model on each of these subsets;
- Models are run in parallel and independent of each other;
- The final prediction are determined by combining all model predictions.
- Robust against outliers and noise;
- Reduce variance and typically avoids overfitting;
- Fast run time;
- Can be slow to score as complexity increases;
- Lack of transparency due to the complexity of multiple trees;
Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model.
- Create a subset of all dataset.
- Initially, all data points are given the same weights.
- Fit a base model on this subset.
- Use this base model to predict on the whole dataset.
- Calculate errors using golden standard and predictions.
- The wrongly predicted data are given higher weights.
- Another model is created with step 3-6. (in order to correct the errors from the previous model)
- Multiple models are created, each correcting the error of the previous model.
- The final model (strong learner) is the weighted mean of all the models.
- Often the best possible model;
- Directly optimize the cost function;
- Not robust against outliers and noise;
- Can overfit;
- Need to find proper stopping point;
Model error arises from noise, bias, and variance.
- Noise is error by the target function;
- Bias is where the algorithm cannot learn the target;
- Variance comes from sampling.
Boosting is recommended on models that have a high bias, not Bagging.
Bagging is recommend for cases of high variance, rather than Boosting.
It is useful for large-size dataset.