[파이썬] seaborn 범주형 변수의 통계 시각화

Data visualization plays a crucial role in data analysis as it helps us understand patterns, trends, and relationships within datasets. One common type of data is categorical variables, which represent groups or categories. Seaborn is a powerful Python library that provides functionalities for visualizing categorical variables.

In this blog post, we will explore various techniques and examples of visualizing categorical variables using Seaborn.

Bar Plot

A bar plot is a common way to display the distribution of a categorical variable. It shows the frequency or proportion of each category using vertical bars.

import seaborn as sns
import matplotlib.pyplot as plt

# Load the example tips dataset from Seaborn
tips = sns.load_dataset("tips")

# Plot a bar plot of the "day" variable
sns.countplot(x="day", data=tips)

# Add labels and title
plt.xlabel("Day of the Week")
plt.ylabel("Count")
plt.title("Distribution of Tips by Day")

# Display the plot
plt.show()

The above code generates a bar plot of the “day” variable from the tips dataset. It shows the count of tips for each day of the week. Adding labels and a title helps provide context to the plot.

Cat Plot

Cat plot is a versatile function in Seaborn that can create various types of categorical plots, including bar plots, point plots, and box plots.

import seaborn as sns

# Load the example tips dataset from Seaborn
tips = sns.load_dataset("tips")

# Plot a categorical plot of tip vs. day and hue by time
sns.catplot(x="day", y="tip", hue="time", data=tips, kind="bar")

# Add labels and title
plt.xlabel("Day of the Week")
plt.ylabel("Tip")
plt.title("Tips by Day and Time")

# Display the plot
plt.show()

In the above code, we use the catplot function to create a bar plot of the “tip” variable against the “day” variable, with the hue (color) being differentiated by the “time” variable. This plot helps visualize the relationship between tips, days of the week, and times.

Box Plot

A box plot is a graphical representation of the distribution of a continuous variable across different categories. It shows the median, quartiles, and potential outliers.

import seaborn as sns

# Load the example tips dataset from Seaborn
tips = sns.load_dataset("tips")

# Plot a box plot of the tip amount by day
sns.boxplot(x="day", y="tip", data=tips)

# Add labels and title
plt.xlabel("Day of the Week")
plt.ylabel("Tip Amount")
plt.title("Distribution of Tip Amount by Day")

# Display the plot
plt.show()

The above code generates a box plot of the “tip” variable by the “day” variable. It visualizes the distribution of tip amounts for each day of the week. The plot helps identify any outliers and provides insights into the central tendency and spread of the tip amounts.

Conclusion

Seaborn is a powerful library that provides convenient functions for visualizing categorical variables. It offers flexibility in creating bar plots, cat plots, and box plots, among other options. By effectively visualizing categorical data, you can gain insights into your datasets and make informed decisions.

Seaborn documentation: https://seaborn.pydata.org/