[파이썬] ggplot 그래프 패턴 및 스타일 분석

Introduction

In data visualization, ggplot is a widely used library in Python that provides a flexible and powerful framework for creating high-quality graphics. ggplot follows the grammar of graphics concept, allowing users to easily create customized visualizations with a few lines of code.

In this blog post, we will explore some common graph patterns and styles that can be implemented using ggplot in Python. These patterns and styles will help you enhance the visual aesthetics and clarity of your graphs.

Scatter Plot with Trend Line

A scatter plot is a useful visualization for examining relationships between two numerical variables. We can use ggplot to create a scatter plot and add a trend line to visualize the overall trend in the data.

import pandas as pd
from plotnine import *

# Create sample data
data = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y': [3, 5, 7, 9, 11]})

# Create scatter plot with trend line
scatter_plot = (
    ggplot(data, aes(x='x', y='y')) +
    geom_point() +
    geom_smooth(method='lm', se=False)
)

# Display the scatter plot
print(scatter_plot)

In the code above, we first import the necessary libraries - pandas and plotnine. We then create a sample dataframe (data) with two columns representing the x and y variables. Using the ggplot function, we specify the dataframe (data) and map the x and y variables to the aesthetics of the scatter plot.

Next, we use geom_point to add the individual data points to the plot, and geom_smooth with the method='lm' parameter to add a trend line based on linear regression. The se=False parameter removes the shaded confidence interval around the trend line.

Finally, we print the scatter plot to see the visualization.

Bar Chart with Custom Color Palette

Bar charts are commonly used to compare categorical variables. By default, ggplot uses a generic color palette for bar charts. However, we can easily customize the color palette to match our preferences.

# Create sample data
data = pd.DataFrame({'category': ['A', 'B', 'C'], 'value': [20, 30, 40]})

# Create bar chart with custom color palette
bar_chart = (
    ggplot(data, aes(x='category', y='value', fill='category')) +
    geom_bar(stat='identity') +
    scale_fill_manual(values=['#FF0000', '#00FF00', '#0000FF'])
)

# Display the bar chart
print(bar_chart)

In the code above, we create a sample dataframe (data) with two columns representing the categories and the corresponding values. We use ggplot to specify the dataframe and map the category and value variables to the aesthetics of the bar chart.

Next, we use geom_bar with the stat='identity' parameter to create a bar chart, where the height of each bar corresponds to the value. We also use scale_fill_manual to customize the fill color of the bars. In this example, we set the color palette to red, green, and blue using hexadecimal color codes.

Finally, we print the bar chart to see the visualization.

Line Plot with Multiple Lines

Line plots are frequently used to visualize trends over time or continuous variables. We can create line plots with multiple lines to compare multiple trends within the same graph.

# Create sample data
data = pd.DataFrame({'x': [1, 2, 3, 4, 5], 'y1': [10, 20, 30, 40, 50], 'y2': [5, 10, 15, 20, 25]})

# Create line plot with multiple lines
line_plot = (
    ggplot(data, aes(x='x')) +
    geom_line(aes(y='y1'), color='red') +
    geom_line(aes(y='y2'), color='blue')
)

# Display the line plot
print(line_plot)

In the above code, we create a sample dataframe (data) with three columns representing the x variable and two y variables. We use ggplot to specify the dataframe and map the x variable to the aesthetics of the line plot.

Next, we create two separate lines using geom_line and map the y variables to the aesthetics. We specify different colors for each line using the color parameter.

Finally, we print the line plot to see the visualization.

Conclusion

ggplot is a powerful library for creating visually appealing and informative graphs in Python. By exploring different graph patterns and styles, you can enhance the visual representation of your data and communicate your findings more effectively. Experiment with various ggplot functionalities to create customized graphs that best suit your needs.