[파이썬] 다중 인덱스와 계층적 인덱스

In data analysis, it is often necessary to work with datasets that have multiple levels of indexes. This is where the concepts of 다중 인덱스(MultiIndex) and 계층적 인덱스(Hierarchical Indexing) come into play. These indexing techniques allow us to organize and manipulate data with complex hierarchical structures.

다중 인덱스(MultiIndex)

A MultiIndex is a type of index where each row can be uniquely identified by a combination of multiple key values. It provides a way to represent higher-dimensional data in a two-dimensional DataFrame or Series.

To create a MultiIndex, you can pass a list of arrays or tuples as the index argument when creating a DataFrame or Series. Let’s look at an example:

import pandas as pd

# Create a DataFrame with a MultiIndex
data = {
    ('A', 'x'): [1, 2, 3],
    ('A', 'y'): [4, 5, 6],
    ('B', 'x'): [7, 8, 9],
    ('B', 'y'): [10, 11, 12]
}

df = pd.DataFrame(data, index=[1, 2, 3])
print(df)

Output:

   A     B   
   x  y  x   y
1  1  4  7  10
2  2  5  8  11
3  3  6  9  12

In this example, we created a DataFrame with a MultiIndex consisting of two levels: ‘A’ and ‘B’. The first level represents the columns ‘A’ and ‘B’, while the second level represents the sub-columns ‘x’ and ‘y’. This allows for a more structured representation of the data.

계층적 인덱스(Hierarchical Indexing)

Hierarchical Indexing, also known as 계층적 인덱스, refers to the arrangement of indexes in a hierarchical structure. It allows for indexing and slicing operations on multiple levels, providing more flexibility in data manipulation.

To create a Hierarchical Index, we can use the pd.MultiIndex.from_tuples() or pd.MultiIndex.from_arrays() functions. These functions allow us to specify the levels and labels of the index.

import pandas as pd

# Create a DataFrame with a Hierarchical Index
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1), ('B', 2)])
data = [10, 20, 30, 40]

df = pd.DataFrame(data, index=index, columns=['Values'])
print(df)

Output:

     Values
A 1      10
  2      20
B 1      30
  2      40

In this example, we created a DataFrame with a Hierarchical Index consisting of two levels: the first level represents the categories ‘A’ and ‘B’, while the second level represents the sub-categories 1 and 2. This allows for more granular data organization and analysis.

Benefits of 다중 인덱스와 계층적 인덱스

  1. Efficient representation of complex data: MultiIndex and Hierarchical Indexing enable us to represent and analyze datasets with complex structures efficiently. This is especially useful when dealing with multi-dimensional data.
  2. Flexible data manipulation: With MultiIndex and Hierarchical Indexing, we can perform indexing and slicing operations on multiple levels, allowing for more flexible data manipulation and analysis.
  3. Clear organization of data: MultiIndex and Hierarchical Indexing provide a structured way to organize and present data with multiple dimensions or categories, making it easier to understand and interpret the information.

In conclusion, MultiIndex and Hierarchical Indexing are powerful tools in Python for managing and analyzing data with complex structures. They provide efficient representation, flexible manipulation, and clear organization of multi-dimensional data. By leveraging these indexing techniques, you can unlock new insights and improve your data analysis workflows.