Machine learning ensembles combine multiple models to improve predictive performance. LightGBM, a popular gradient boosting framework, provides several techniques for ensemble learning. In this blog post, we will explore some of these strategies and demonstrate how to implement them in Python.
1. Bagging
Bagging is a technique that involves training multiple models on different subsets of the training data and combining their predictions. LightGBM supports bagging through the bagging_fraction
and bagging_freq
parameters.
import lightgbm as lgb
params = {
'boosting_type': 'gbdt',
'objective': 'binary',
'bagging_fraction': 0.8, # fraction of data to be used for bagging
'bagging_freq': 5, # perform bagging every 5 iterations
'metric': 'binary_logloss'
}
model = lgb.train(params, train_data, num_boost_round=100)
In this example, bagging_fraction
is set to 0.8, meaning each model will be trained on 80% of the training data randomly sampled with replacement. bagging_freq
is set to 5, which means bagging will be performed every 5 iterations.
2. Feature Subsampling
Feature subsampling is a technique that involves training models on different subsets of features. LightGBM supports feature subsampling through the feature_fraction
parameter.
import lightgbm as lgb
params = {
'boosting_type': 'gbdt',
'objective': 'binary',
'feature_fraction': 0.6, # fraction of features to be used for training
'metric': 'binary_logloss'
}
model = lgb.train(params, train_data, num_boost_round=100)
In this example, feature_fraction
is set to 0.6, meaning each model will be trained on 60% of the features randomly selected.
3. Voting
Voting is a technique that involves combining the predictions of multiple models using majority voting or weighted voting. LightGBM provides a convenient way to perform voting using the predict()
method.
import lightgbm as lgb
from sklearn.metrics import accuracy_score
model1 = lgb.train(params, train_data1, num_boost_round=100)
model2 = lgb.train(params, train_data2, num_boost_round=100)
preds1 = model1.predict(test_data)
preds2 = model2.predict(test_data)
preds_combined = (preds1 + preds2) / 2 # majority voting
# Alternatively, you can use weighted voting
preds_combined = preds1 * w1 + preds2 * w2
# Evaluate the combined predictions
accuracy = accuracy_score(true_labels, preds_combined.round())
In this example, two models (model1
and model2
) are trained on different training datasets (train_data1
and train_data2
). The predictions of both models are combined using either majority voting or weighted voting.
Ensemble learning with LightGBM can significantly improve the predictive performance of your models. By leveraging techniques such as bagging, feature subsampling, and voting, you can create more robust and accurate machine learning models.