[파이썬] xgboost 학습률 및 깊이 조절

07 Sep 2023

xgboost

XGBoost is a popular machine learning library known for its powerful gradient boosting algorithms. One of the key factors in training an accurate XGBoost model is adjusting the learning rate and depth of the decision trees. In this blog post, we will explore how to adjust these parameters in Python to achieve optimal results.

Importing the Required Libraries

Before we begin, we need to import the necessary libraries. Make sure you have XGBoost installed in your Python environment.

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Loading the Dataset

Let’s start by loading a sample dataset to work with. For this example, we will use the Iris dataset available in the scikit-learn library.

from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target

Splitting the Dataset

Next, we need to split the dataset into training and testing sets. We will use the train_test_split() function from scikit-learn for this purpose.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Defining and Training the XGBoost Model

Now let’s define the XGBoost classifier and train the model using the training set.

model = xgb.XGBClassifier(learning_rate=0.1, max_depth=3)
model.fit(X_train, y_train)

Evaluating the Model

After training the model, we can evaluate its performance by making predictions on the testing set and calculating the accuracy score.

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Adjusting the Learning Rate and Depth

To adjust the learning rate and depth of the XGBoost model, you can simply modify their values when creating the XGBClassifier object.

model = xgb.XGBClassifier(learning_rate=0.01, max_depth=5)

The learning_rate parameter controls the step size at each boosting iteration. Lower values can be helpful in preventing overfitting, but may require more iterations to converge.
The max_depth parameter represents the maximum depth of the decision trees. Increasing this value can make the model more complex, but may also lead to overfitting.

Experiment with different values for these parameters to find the optimal combination that yields the best results for your specific problem.

Conclusion

In this blog post, we discussed the importance of adjusting the learning rate and depth in XGBoost models. We also provided an example of how to implement this in Python using the XGBoost library. Remember to fine-tune these parameters based on your specific dataset and problem to achieve the best possible accuracy.