[파이썬] pandas 조건부 합계 및 평균

Pandas is a powerful data manipulation library in Python that provides various functionalities for working with structured data. One of the common tasks in data analysis is calculating conditional sums and averages based on specific conditions. In this blog post, we will explore how to calculate conditional sums and averages using Pandas in Python.

Calculating Conditional Sum

To calculate a conditional sum in Pandas, we can use the groupby() function combined with the sum() function. The groupby() function allows us to group the data based on a column or multiple columns, and then we can apply an aggregation function, such as sum(), to calculate the sum of the groups.

Let’s consider an example where we have a DataFrame with sales data:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
   'Product': ['A', 'A', 'B', 'B', 'A', 'B'],
   'Sales': [100, 200, 150, 250, 300, 400]
})

# Calculate conditional sum based on Product column
sum_by_product = df.groupby('Product')['Sales'].sum()

print(sum_by_product)

Output:

Product
A    600
B    800
Name: Sales, dtype: int64

In the above example, we grouped the DataFrame by the ‘Product’ column and calculated the sum of the ‘Sales’ column for each group. The resulting Series gives us the conditional sum based on the ‘Product’ column.

Calculating Conditional Average

Similar to calculating the conditional sum, we can use the groupby() function and the mean() function to calculate conditional averages in Pandas.

Consider the same sales data example:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
   'Product': ['A', 'A', 'B', 'B', 'A', 'B'],
   'Sales': [100, 200, 150, 250, 300, 400]
})

# Calculate conditional average based on Product column
average_by_product = df.groupby('Product')['Sales'].mean()

print(average_by_product)

Output:

Product
A    200
B    300
Name: Sales, dtype: int64

In the above code, we used the same approach as calculating the conditional sum, but used the mean() function instead of sum(). The resulting Series gives us the conditional average based on the ‘Product’ column.

Conclusion

In this blog post, we learned how to calculate the conditional sum and average using Pandas in Python. Using the groupby() function, we can group the data based on a specific column and then apply the desired aggregation function, such as sum() or mean(), to calculate the conditional sum or average.

Pandas provides a rich set of functionalities for data manipulation and analysis, making it a versatile library for working with structured data. By utilizing these functionalities, we can efficiently perform complex calculations and gain valuable insights from our data.