Selection by Label
In this tutorial, we'll explore the concept of 'Selection by Label' in pandas, a powerful Python library for data manipulation and analysis.
Understanding the Basics
Pandas provides various methods for selecting data out of a DataFrame or Series, and 'Selection by Label' is one of the most common operations. We use labels to refer to the name of the columns and indices.
DataFrame Creation
Let's start by creating a simple DataFrame:
import pandas as pd
data = {
'fruit': ['apple', 'banana', 'cherry', 'date'],
'color': ['red', 'yellow', 'red', 'brown'],
'weight': [120, 150, 10, 15]
}
df = pd.DataFrame(data)
df
This code will generate a DataFrame with columns labeled 'fruit', 'color', and 'weight'.
.loc Accessor
Pandas provides the .loc
accessor for label-based selection. It's used like so:
df.loc[1, 'fruit']
In this example, 1
is the label of the row, and 'fruit'
is the label of the column. This will return the fruit at index 1 — 'banana'.
Selecting Multiple Columns
You can select multiple columns by passing a list of column labels:
df.loc[:, ['fruit', 'color']]
Here, the colon :
means "all rows", and ['fruit', 'color']
is a list of the column labels we're interested in.
Selecting Ranges
You can also use label-based slicing to select a range of rows:
df.loc[1:3]
This will return all rows from label 1 to 3 (inclusive).
Conditional Selection
The .loc
accessor also supports boolean indexing for conditional selection:
df.loc[df['weight'] > 100]
This will return all rows where the 'weight' is greater than 100.
.at Accessor
For accessing a scalar value, a faster method is at:
df.at[1, 'fruit']
This is similar to .loc
but faster because it accesses the exact location directly.
Wrap Up
In this tutorial, we have learned the basics of Selection by Label in pandas. We have covered how to use .loc
and .at
to select data based on the labels. We've also seen how to select multiple columns, how to select ranges of data, and how to use conditional selection.
Remember, practice is essential to get the hang of these concepts. Try to play around with these methods with different datasets. Happy data wrangling!