Boolean indexing is a powerful feature in pandas, a popular data manipulation library in Python. With Boolean indexing, we can select subsets of data based on conditional statements, making data filtering and manipulation more efficient.
What is Boolean indexing?
Boolean indexing in pandas involves selecting rows or certain elements of a DataFrame or Series based on a given condition or a set of conditions. These conditions usually involve logical operators such as &
(and), |
(or), and ~
(not).
Example scenario
Let’s say we have a DataFrame that contains information about students and their test scores. We want to filter the DataFrame to only include the rows where the test score is greater than or equal to 90. We can achieve this using Boolean indexing.
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'Dave'],
'TestScore': [85, 92, 88, 95]}
df = pd.DataFrame(data)
# Boolean indexing to filter rows
filtered_df = df[df['TestScore'] >= 90]
print(filtered_df)
Output:
Name TestScore
1 Bob 92
3 Dave 95
Explanation
In the example above, we import the pandas library and create a DataFrame called df
with student names (Name
) and their respective test scores (TestScore
).
We then use Boolean indexing on df
by creating a filter condition (df['TestScore'] >= 90
) inside square brackets. This condition checks whether the TestScore column is greater than or equal to 90 for each row. The result of this condition is a boolean Series, where True
represents the rows that satisfy the condition and False
represents the rows that do not.
Passing this boolean Series as an index to the original DataFrame df
filters the DataFrame and returns only the rows where the condition is True
. We assign this filtered DataFrame to a new variable called filtered_df
.
Finally, we print the filtered_df
which shows only the rows where the test score is greater than or equal to 90.
Conclusion
Boolean indexing in pandas allows us to efficiently filter and manipulate data based on selected conditions. By using logical operators and conditional statements, we can extract subsets of rows or specific elements from a DataFrame or Series. This feature is extremely useful for data analysis, handling large datasets, and performing complex data manipulations.