[파이썬] lightgbm Python API 활용

LightGBM is a popular gradient boosting framework that is widely used in machine learning tasks. It is known for its efficiency and high performance, especially when dealing with large and complex datasets. In this blog post, we will explore how to use the LightGBM Python API to train and evaluate a model.

Installation

First, let’s install LightGBM using pip:

pip install lightgbm

Make sure you have numpy and scikit-learn installed as well, as LightGBM depends on them.

Loading and Preparing the Data

To train a model with LightGBM, we need to load and prepare the data. In this example, we will be using the famous Iris dataset as an illustration.

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert data to LightGBM Dataset format
train_data = lgb.Dataset(X_train, label=y_train)
test_data = lgb.Dataset(X_test, label=y_test)

Training the Model

With the data prepared, let’s move on to training the model using LightGBM. Here’s an example of training a simple classification model:

import lightgbm as lgb

# Set the hyperparameters for the model
params = {
    'boosting_type': 'gbdt',
    'objective': 'multiclass',
    'num_classes': 3,
    'metric': 'multi_logloss',
    'learning_rate': 0.1,
}

# Train the model
model = lgb.train(params, train_data, num_boost_round=100)

# Predict on the test data
y_pred = model.predict(X_test)

In the code above, we set the hyperparameters for the model, such as the boosting type, the objective function, and the learning rate. We then train the model using the lgb.train function, specifying the parameters, the training data, and the number of boosting rounds. Finally, we predict on the test data using the trained model.

Evaluating the Model

After training the model, it’s important to evaluate its performance. LightGBM provides several evaluation metrics for classification and regression tasks.

from sklearn.metrics import accuracy_score

# Convert predicted probabilities to class labels
y_pred_labels = np.argmax(y_pred, axis=1)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred_labels)
print(f"Accuracy: {accuracy}")

In this example, we use the accuracy_score metric from scikit-learn to calculate the accuracy of the model predictions.

Conclusion

In this blog post, we explored how to use the LightGBM Python API to train and evaluate a model. We covered the steps involved in loading and preparing the data, training the model, and evaluating its performance. LightGBM offers a powerful and efficient gradient boosting framework, making it a valuable tool in machine learning projects.

Remember to experiment with different hyperparameters and techniques to improve your model’s performance. Happy coding!