Pandas is a powerful data analysis library in Python that provides various functions for data manipulation and joining. One important aspect of data joining is the ability to perform different types of joins, such as outer join, inner join, left join, and right join. In this blog post, we will explore each of these join types and see how they can be implemented in Pandas.
1. Outer Join
An outer join returns a result set that contains all the rows from both left and right dataframes, combining them based on a common key column. If a row does not have a match in the other dataframe, null values will be filled in.
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({'ID': [1, 2, 3, 4],
'Name': ['John', 'Lisa', 'Sam', 'Emma']})
df2 = pd.DataFrame({'ID': [3, 4, 5, 6],
'Age': [28, 32, 35, 29]})
# Perform outer join
outer_join = pd.merge(df1, df2, on='ID', how='outer')
# Print the result
print(outer_join)
Output:
ID Name Age
0 1 John NaN
1 2 Lisa NaN
2 3 Sam 28.0
3 4 Emma 32.0
4 5 NaN 35.0
5 6 NaN 29.0
As we can see, the outer join result contains all the rows from both dataframes, with null values filled in for the non-matching rows.
2. Inner Join
An inner join returns a result set that contains only the rows that have matching values in both dataframes.
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({'ID': [1, 2, 3, 4],
'Name': ['John', 'Lisa', 'Sam', 'Emma']})
df2 = pd.DataFrame({'ID': [3, 4, 5, 6],
'Age': [28, 32, 35, 29]})
# Perform inner join
inner_join = pd.merge(df1, df2, on='ID', how='inner')
# Print the result
print(inner_join)
Output:
ID Name Age
0 3 Sam 28
1 4 Emma 32
In the inner join result, only the rows with matching IDs in both dataframes are included.
3. Left Join
A left join returns a result set that contains all the rows from the left dataframe and the matching rows from the right dataframe. If a row does not have a match in the right dataframe, null values will be filled in.
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({'ID': [1, 2, 3, 4],
'Name': ['John', 'Lisa', 'Sam', 'Emma']})
df2 = pd.DataFrame({'ID': [3, 4, 5, 6],
'Age': [28, 32, 35, 29]})
# Perform left join
left_join = pd.merge(df1, df2, on='ID', how='left')
# Print the result
print(left_join)
Output:
ID Name Age
0 1 John NaN
1 2 Lisa NaN
2 3 Sam 28.0
3 4 Emma 32.0
The left join result includes all the rows from the left dataframe, with null values filled in for the non-matching rows from the right dataframe.
4. Right Join
A right join returns a result set that contains all the rows from the right dataframe and the matching rows from the left dataframe. If a row does not have a match in the left dataframe, null values will be filled in.
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({'ID': [1, 2, 3, 4],
'Name': ['John', 'Lisa', 'Sam', 'Emma']})
df2 = pd.DataFrame({'ID': [3, 4, 5, 6],
'Age': [28, 32, 35, 29]})
# Perform right join
right_join = pd.merge(df1, df2, on='ID', how='right')
# Print the result
print(right_join)
Output:
ID Name Age
0 3 Sam 28
1 4 Emma 32
2 5 NaN 35
3 6 NaN 29
The right join result includes all the rows from the right dataframe, with null values filled in for the non-matching rows from the left dataframe.
In conclusion, Pandas provides a convenient way to perform different types of joins on dataframes. By utilizing the merge()
function and specifying the appropriate join type using the how
parameter, we can combine data from multiple sources based on common key columns. These join operations are essential for data analysis tasks that involve combining and merging datasets.