[파이썬][scikit-learn] scikit-learn에서 GridSearchCV

GridSearchCV is a powerful tool in scikit-learn for hyperparameter tuning. It helps in finding the best set of hyperparameters for a machine learning model by systematically searching through the parameter grid and evaluating multiple combinations.

Why is hyperparameter tuning important?

Hyperparameters are the configuration settings of a machine learning algorithm that cannot be learned from the data. They are set by the data scientist before the model training process. Tuning these hyperparameters is essential to achieve optimal performance and avoid issues like overfitting or underfitting.

How does GridSearchCV work?

GridSearchCV is a technique that automates the process of tuning hyperparameters by searching over all possible combinations of hyperparameters in a predefined grid. It trains and evaluates the model for each combination and returns the combination that yields the best performance.

Example of using GridSearchCV in scikit-learn

Let’s take an example of using GridSearchCV with the scikit-learn library in Python. We will use the popular Random Forest algorithm for classification and tune its hyperparameters using GridSearchCV.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 150],
    'max_depth': [None, 5, 10],
    'min_samples_split': [2, 5, 10]
}

# Create a Random Forest classifier
rf = RandomForestClassifier()

# Create a GridSearchCV object
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, scoring='accuracy')

# Fit the GridSearchCV object to the data
grid_search.fit(iris.data, iris.target)

# Print the best set of hyperparameters
print("Best parameters:", grid_search.best_params_)

# Print the best accuracy score
print("Best accuracy:", grid_search.best_score_)

In the above example, we first import the necessary libraries like RandomForestClassifier, GridSearchCV, and load_iris dataset from scikit-learn. Then, we define the parameter grid with different values for n_estimators, max_depth, and min_samples_split.

Next, we create a Random Forest classifier and a GridSearchCV object. We pass the classifier, parameter grid, cross-validation parameter cv, and scoring metric scoring to the GridSearchCV object. Finally, we fit the GridSearchCV object to the data and print the best set of hyperparameters and the corresponding accuracy score.

GridSearchCV simplifies the hyperparameter tuning process and helps in finding the optimal hyperparameters for your machine learning model. It saves time and effort by automating the search process. Try using GridSearchCV to improve the performance of your models.