[파이썬] matplotlib 범주형 데이터 시각화

Data visualization is an essential part of data analysis and interpretation. It helps to understand patterns, relationships, and trends in the data. Matplotlib is a popular data visualization library in Python that provides a wide range of plotting options. In this blog post, we will explore how to create visualizations for categorical data using Matplotlib.

Importing Matplotlib

First, let’s import the matplotlib.pyplot module, which provides the plotting functions and objects.

import matplotlib.pyplot as plt

Creating a Bar Chart

A bar chart is a great way to visualize categorical data. It represents the data using rectangular bars, where the length of each bar corresponds to the quantity or frequency of the category.

To create a bar chart, we need two arrays: one for the categories and another for their corresponding values. Let’s assume we have the following data:

categories = ['Apples', 'Oranges', 'Bananas']
values = [25, 30, 15]

We can use the bar() function to create a bar chart:

plt.bar(categories, values)
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Chart')
plt.show()

The xlabel(), ylabel(), and title() functions are used to label the x-axis, y-axis, and the chart title respectively.

Creating a Pie Chart

A pie chart is another useful visualization for categorical data. It represents the data as slices of a pie, where each slice corresponds to a category and the size of the slice represents the proportion or percentage of that category.

To create a pie chart, we need the same two arrays: one for the categories and another for their corresponding values. Let’s use the same data as before:

categories = ['Apples', 'Oranges', 'Bananas']
values = [25, 30, 15]

We can use the pie() function to create a pie chart:

plt.pie(values, labels=categories, autopct='%1.1f%%')
plt.title('Pie Chart')
plt.show()

The labels parameter is used to provide labels for each slice, and the autopct parameter is used to format the percentage of each slice.

Creating a Stacked Bar Chart

A stacked bar chart is useful when we want to compare the total values of different categories and their subcategories. It represents the data using rectangular bars, where each bar is divided into sections corresponding to the subcategories.

To create a stacked bar chart, we need three arrays: one for the categories, one for the subcategories, and another for their corresponding values.

categories = ['Apples', 'Oranges', 'Bananas']
subcategories = ['Red', 'Orange', 'Yellow']
values = [[10, 5, 10], [15, 10, 5], [5, 5, 5]]

We can use the bar() function with the bottom parameter to create a stacked bar chart:

bottom = None
colors = ['red', 'orange', 'yellow']
for i, subcategory in enumerate(subcategories):
    plt.bar(categories, values[i], bottom=bottom, label=subcategory, color=colors[i])
    if bottom is None:
        bottom = values[i]
    else:
        bottom = [sum(x) for x in zip(bottom, values[i])]

plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Stacked Bar Chart')
plt.legend()
plt.show()

The bottom parameter is used to specify the starting point for each subcategory, and the label parameter is used to provide a legend for each subcategory.

Conclusion

Matplotlib provides a range of functions and options to create visually appealing and informative visualizations for categorical data. In this blog post, we explored how to create bar charts, pie charts, and stacked bar charts using Matplotlib in Python. With these techniques, you can effectively communicate and analyze categorical data. Matplotlib’s versatility and flexibility make it a valuable tool for data visualization in Python.