[파이썬] lightgbm에서의 앙상블 학습 전략

Machine learning ensembles combine multiple models to improve predictive performance. LightGBM, a popular gradient boosting framework, provides several techniques for ensemble learning. In this blog post, we will explore some of these strategies and demonstrate how to implement them in Python.

1. Bagging

Bagging is a technique that involves training multiple models on different subsets of the training data and combining their predictions. LightGBM supports bagging through the bagging_fraction and bagging_freq parameters.

import lightgbm as lgb

params = {
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'bagging_fraction': 0.8,  # fraction of data to be used for bagging
    'bagging_freq': 5,  # perform bagging every 5 iterations
    'metric': 'binary_logloss'
}

model = lgb.train(params, train_data, num_boost_round=100)

In this example, bagging_fraction is set to 0.8, meaning each model will be trained on 80% of the training data randomly sampled with replacement. bagging_freq is set to 5, which means bagging will be performed every 5 iterations.

2. Feature Subsampling

Feature subsampling is a technique that involves training models on different subsets of features. LightGBM supports feature subsampling through the feature_fraction parameter.

import lightgbm as lgb

params = {
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'feature_fraction': 0.6,  # fraction of features to be used for training
    'metric': 'binary_logloss'
}

model = lgb.train(params, train_data, num_boost_round=100)

In this example, feature_fraction is set to 0.6, meaning each model will be trained on 60% of the features randomly selected.

3. Voting

Voting is a technique that involves combining the predictions of multiple models using majority voting or weighted voting. LightGBM provides a convenient way to perform voting using the predict() method.

import lightgbm as lgb
from sklearn.metrics import accuracy_score

model1 = lgb.train(params, train_data1, num_boost_round=100)
model2 = lgb.train(params, train_data2, num_boost_round=100)

preds1 = model1.predict(test_data)
preds2 = model2.predict(test_data)

preds_combined = (preds1 + preds2) / 2  # majority voting

# Alternatively, you can use weighted voting
preds_combined = preds1 * w1 + preds2 * w2

# Evaluate the combined predictions
accuracy = accuracy_score(true_labels, preds_combined.round())

In this example, two models (model1 and model2) are trained on different training datasets (train_data1 and train_data2). The predictions of both models are combined using either majority voting or weighted voting.

Ensemble learning with LightGBM can significantly improve the predictive performance of your models. By leveraging techniques such as bagging, feature subsampling, and voting, you can create more robust and accurate machine learning models.