[파이썬] pandas 외부 조인, 내부 조인, 왼쪽 및 오른쪽 조인

Pandas is a powerful data analysis library in Python that provides various functions for data manipulation and joining. One important aspect of data joining is the ability to perform different types of joins, such as outer join, inner join, left join, and right join. In this blog post, we will explore each of these join types and see how they can be implemented in Pandas.

1. Outer Join

An outer join returns a result set that contains all the rows from both left and right dataframes, combining them based on a common key column. If a row does not have a match in the other dataframe, null values will be filled in.

import pandas as pd

# Create two dataframes
df1 = pd.DataFrame({'ID': [1, 2, 3, 4],
                    'Name': ['John', 'Lisa', 'Sam', 'Emma']})

df2 = pd.DataFrame({'ID': [3, 4, 5, 6],
                    'Age': [28, 32, 35, 29]})

# Perform outer join
outer_join = pd.merge(df1, df2, on='ID', how='outer')

# Print the result
print(outer_join)

Output:

   ID  Name   Age
0   1  John   NaN
1   2  Lisa   NaN
2   3   Sam  28.0
3   4  Emma  32.0
4   5   NaN  35.0
5   6   NaN  29.0

As we can see, the outer join result contains all the rows from both dataframes, with null values filled in for the non-matching rows.

2. Inner Join

An inner join returns a result set that contains only the rows that have matching values in both dataframes.

import pandas as pd

# Create two dataframes
df1 = pd.DataFrame({'ID': [1, 2, 3, 4],
                    'Name': ['John', 'Lisa', 'Sam', 'Emma']})

df2 = pd.DataFrame({'ID': [3, 4, 5, 6],
                    'Age': [28, 32, 35, 29]})

# Perform inner join
inner_join = pd.merge(df1, df2, on='ID', how='inner')

# Print the result
print(inner_join)

Output:

   ID  Name  Age
0   3   Sam   28
1   4  Emma   32

In the inner join result, only the rows with matching IDs in both dataframes are included.

3. Left Join

A left join returns a result set that contains all the rows from the left dataframe and the matching rows from the right dataframe. If a row does not have a match in the right dataframe, null values will be filled in.

import pandas as pd

# Create two dataframes
df1 = pd.DataFrame({'ID': [1, 2, 3, 4],
                    'Name': ['John', 'Lisa', 'Sam', 'Emma']})

df2 = pd.DataFrame({'ID': [3, 4, 5, 6],
                    'Age': [28, 32, 35, 29]})

# Perform left join
left_join = pd.merge(df1, df2, on='ID', how='left')

# Print the result
print(left_join)

Output:

   ID  Name   Age
0   1  John   NaN
1   2  Lisa   NaN
2   3   Sam  28.0
3   4  Emma  32.0

The left join result includes all the rows from the left dataframe, with null values filled in for the non-matching rows from the right dataframe.

4. Right Join

A right join returns a result set that contains all the rows from the right dataframe and the matching rows from the left dataframe. If a row does not have a match in the left dataframe, null values will be filled in.

import pandas as pd

# Create two dataframes
df1 = pd.DataFrame({'ID': [1, 2, 3, 4],
                    'Name': ['John', 'Lisa', 'Sam', 'Emma']})

df2 = pd.DataFrame({'ID': [3, 4, 5, 6],
                    'Age': [28, 32, 35, 29]})

# Perform right join
right_join = pd.merge(df1, df2, on='ID', how='right')

# Print the result
print(right_join)

Output:

   ID  Name  Age
0   3   Sam   28
1   4  Emma   32
2   5   NaN   35
3   6   NaN   29

The right join result includes all the rows from the right dataframe, with null values filled in for the non-matching rows from the left dataframe.

In conclusion, Pandas provides a convenient way to perform different types of joins on dataframes. By utilizing the merge() function and specifying the appropriate join type using the how parameter, we can combine data from multiple sources based on common key columns. These join operations are essential for data analysis tasks that involve combining and merging datasets.