[파이썬] pandas explode와 implode

Pandas, the popular data manipulation library in Python, provides several useful functionalities for working with complex data structures. Two important functions in pandas are explode and implode, which allow you to flatten and reshape your data in different ways. In this blog post, we will explore how these functions can be used to simplify data transformations and make your code more efficient.

Explode: Flattening a Column with Nested Values

The explode function in pandas is used to flatten a column that contains nested values. It is particularly useful when you have a column with lists, tuples, or other iterable objects, and you want to expand each element in the iterable into its own row. This can be helpful for conducting further analysis or processing on the expanded data.

Let’s consider an example to understand how explode works. Suppose we have a DataFrame with a column named fruits that contains lists of fruits:

import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'fruits': [['apple', 'banana'], ['orange', 'mango'], ['grape', 'kiwi']]
}

df = pd.DataFrame(data)

Our DataFrame df looks like this:

      name           fruits
0    Alice  [apple, banana]
1      Bob  [orange, mango]
2  Charlie    [grape, kiwi]

To flatten the fruits column, we can use the explode function as follows:

df_exploded = df.explode('fruits')

The resulting DataFrame df_exploded will have one row for each fruit:

      name   fruits
0    Alice    apple
0    Alice   banana
1      Bob   orange
1      Bob    mango
2  Charlie    grape
2  Charlie     kiwi

As you can see, the fruits column has been flattened, and each fruit is now in a separate row.

Implode: Combining Rows into a Single Column

On the other hand, the implode function is used to combine multiple rows from a DataFrame into a single column. It can be handy when you want to merge values from multiple rows based on a common identifier. implode can also handle different types of data structures, such as lists or strings, depending on your requirements.

Let’s illustrate the implode function using the following example. Suppose we have a DataFrame with multiple rows for each person, where each row represents a different fruit preference:

import pandas as pd

data = {
    'name': ['Alice', 'Alice', 'Bob', 'Bob', 'Charlie', 'Charlie'],
    'fruit': ['apple', 'banana', 'orange', 'mango', 'grape', 'kiwi']
}

df = pd.DataFrame(data)

Our DataFrame df looks like this:

      name   fruit
0    Alice   apple
1    Alice  banana
2      Bob  orange
3      Bob   mango
4  Charlie   grape
5  Charlie    kiwi

To combine the fruits for each person into a single column, we can use the implode function as follows:

df_imploded = df.groupby('name')['fruit'].apply(','.join).reset_index()

The resulting DataFrame df_imploded will have one row for each person, with their fruits combined into a single column:

      name            fruit
0    Alice    apple,banana
1      Bob   orange,mango
2  Charlie    grape,kiwi

As you can see, the fruits for each person have been merged into a comma-separated string within the fruit column.

Conclusion

In this blog post, we explored the explode and implode functions in pandas. explode is useful for flattening columns with nested values, while implode can be handy for combining rows into a single column. These functions enable you to transform your data in a more convenient and efficient manner, making your data manipulation tasks easier to accomplish.

By using explode and implode effectively in your pandas code, you can simplify your data transformations, reshape your data structures, and extract valuable insights from your datasets. So, go ahead, give them a try, and see how they can enhance your data manipulation workflow!