Boolean Indexing
In this tutorial, we'll learn about an essential concept in pandas called Boolean Indexing. This is a powerful tool for selecting and manipulating data based on conditions. Essentially, Boolean Indexing allows you to filter data in a DataFrame or Series using conditional statements.
What is Boolean Indexing?
Boolean Indexing in pandas is a type of indexing which uses boolean vectors to filter data. The term 'Boolean' refers to a system of logical thought that is used to make true/false decisions.
Creating a DataFrame
Before we start with Boolean Indexing, let's create a simple DataFrame.
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
This code will create a DataFrame with 'Name', 'Age' and 'City' columns.
Basic Boolean Indexing
Let's say we want to select all rows where 'Age' is greater than 30. We can do this using Boolean Indexing.
df[df['Age'] > 30]
In this code, df['Age'] > 30
is a Boolean condition which checks if 'Age' is greater than 30. This condition returns a Boolean Series of the same length as the DataFrame, with 'True' for rows where 'Age' is greater than 30, and 'False' otherwise. When this Boolean Series is used to index the DataFrame df
, only rows where the Series is 'True' are selected.
Multiple Conditions
You can also use multiple conditions by combining them with &
(and) and |
(or) operators.
df[(df['Age'] > 30) & (df['City'] == 'London')]
This code will select rows where 'Age' is greater than 30 and 'City' is 'London'.
Using the isin()
Method
The isin()
method is used to filter data where a value is in a certain list.
df[df['City'].isin(['Paris', 'Berlin'])]
This code will select rows where 'City' is either 'Paris' or 'Berlin'.
Using the ~
Operator
The ~
operator is used to select rows where a condition is not true.
df[~(df['Age'] > 30)]
This code will select rows where 'Age' is not greater than 30.
Conclusion
Boolean Indexing is a powerful technique to select and manipulate data based on conditions. It allows us to filter data in versatile ways using simple or complex conditions. Try to use Boolean Indexing in your own pandas code to see how it can make data manipulation easier and more efficient!