[파이썬] 데이터 피벗 테이블 생성

In data analysis and data preprocessing tasks, creating pivot tables is a common technique to summarize and analyze data in a structured manner. A pivot table aggregates and calculates values based on one or more variables. In this blog post, we will explore how to create pivot tables in Python using the popular pandas library.

Prerequisites

To follow along with the code examples in this tutorial, make sure you have the following prerequisites:

pip install pandas

Importing the necessary libraries

In Python, we need to import the pandas library to create pivot tables. Additionally, we will import a dataset to work with. Let’s start by importing the necessary libraries:

import pandas as pd

Loading the dataset

For this tutorial, let’s assume we have a dataset called sales_data.csv that contains information about sales transactions. We will load this dataset into a pandas dataframe:

data = pd.read_csv('sales_data.csv')

Creating a pivot table

Now that we have our dataset loaded, we can create a pivot table using the pivot_table function provided by pandas. Let’s say we want to calculate the sum of revenue for each product category and region. Here’s how we can accomplish that:

pivot_table_data = data.pivot_table(values='revenue', index='category', columns='region', aggfunc='sum')

In the above code, we specify the values parameter as the column name that we want to aggregate (‘revenue’ in this case). The index parameter represents the variable that we want to group our data by (‘category’ in this case), and the columns parameter represents the second variable that we want to group by (‘region’ in this case). Finally, the aggfunc parameter defines the aggregation function to be applied (‘sum’ in this case).

Displaying the pivot table

To view the generated pivot table, we can simply print the pivot_table_data dataframe:

print(pivot_table_data)

This will display the pivot table in a tabular format with the sum of revenue for each product category and region.

Conclusion

Creating pivot tables in Python is a powerful tool for summarizing and analyzing data. In this blog post, we explored how to create pivot tables using the pandas library. We learned how to load a dataset, create a pivot table, and display the results. By utilizing pivot tables, we can gain valuable insights from our data and make informed decisions in data analysis tasks.