[파이썬] catboost 분류 모델 학습

CatBoost Logo

CatBoost is a powerful open-source gradient boosting library developed by Yandex. It is known for its ability to handle categorical features efficiently and produce accurate predictions. In this blog post, we will learn how to train a CatBoost classification model in Python.

Installation

Before we start, make sure you have CatBoost installed in your Python environment. You can install it using pip:

pip install catboost

Importing Libraries

Once CatBoost is installed, we can import the necessary libraries for our classification model:

import pandas as pd
import catboost as cb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Loading the Data

Next, we need to load our dataset. For this example, let’s assume we have a CSV file named data.csv with features and labels. We can use the pandas library to load the data into a DataFrame:

data = pd.read_csv('data.csv')

# Separate features and labels
X = data.drop('label', axis=1)
y = data['label']

Splitting the Data

Before training our CatBoost model, we should split the data into training and test sets. This will allow us to evaluate the performance of our model on unseen data. We can use the train_test_split function from sklearn for this:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Training the Model

Now, we can train our CatBoost classification model. We need to define a CatBoostClassifier object and specify the hyperparameters we want to use. Here is an example:

model = cb.CatBoostClassifier(iterations=100, learning_rate=0.1, depth=6)
model.fit(X_train, y_train, verbose=False)

In this example, we have set the number of iterations to 100, learning rate to 0.1, and max depth to 6. Feel free to adjust these hyperparameters based on your dataset and problem.

Evaluating the Model

Once the model is trained, we can evaluate its performance on the test set. In this example, we will use the accuracy score as our evaluation metric:

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Conclusion

In this blog post, we learned how to train a CatBoost classification model in Python. We covered the installation of CatBoost, loading the data, splitting the data into train and test sets, training the model, and evaluating its performance. CatBoost is a powerful tool for classification tasks, especially when dealing with categorical features. Feel free to experiment with different hyperparameters and explore more advanced techniques with CatBoost!