[파이썬][scikit-learn] scikit-learn에서 Confusion Matrix

Confusion Matrix is a popular tool in machine learning and data science for evaluating the performance of a classification model. It provides a detailed report of how well the model predicts the target labels by showing the counts of true positives, false positives, true negatives, and false negatives.

In the scikit-learn library, you can easily generate a Confusion Matrix using the confusion_matrix function from the sklearn.metrics module. Let’s see an example of how to use it.

First, you need to import the necessary libraries:

import numpy as np
from sklearn.metrics import confusion_matrix

Next, you need to define the ground truth labels and the predicted labels. These labels can be in the form of a list, numpy array, or pandas Series:

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 1, 1, 1, 0, 1, 0, 0, 0, 1])

Now, you can calculate the Confusion Matrix:

cm = confusion_matrix(y_true, y_pred)
print(cm)

The output will be a 2x2 matrix representing the confusion matrix:

[[3 2]
 [1 4]]

In the above matrix, the row indices represent the true labels, and the column indices represent the predicted labels. The values inside the matrix represent the counts of each category. For example, the value at index (0, 0) corresponds to true negatives, (0, 1) corresponds to false positives, (1, 0) corresponds to false negatives, and (1, 1) corresponds to true positives.

You can further enhance the visualization of the Confusion Matrix using the heatmap function from the seaborn library:

import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(cm, annot=True, cmap='Blues')
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.show()

This will display a color-coded matrix with the counts of each category.

Understanding the Confusion Matrix is crucial for evaluating the performance of classification models. It provides valuable insights into the types of errors made by the model and can help in identifying areas for improvement.

By using scikit-learn’s confusion_matrix function, you can easily generate and analyze the Confusion Matrix for your classification models.