Selection by Position
Selection by Position
When working with a pandas DataFrame, it is often necessary to select data based on its position in the DataFrame. This is where pandas 'Selection by Position' comes in. This method allows you to select data from specific rows and columns by their numerical index.
Basics
To begin, we need to understand the basic method for positional indexing, iloc[]
. This method allows us to select data from a DataFrame by integer location only.
Let's consider a DataFrame df
with 5 rows and 3 columns:
import pandas as pd
import numpy as np
data = np.array([['', 'Col1', 'Col2', 'Col3'],
['Row1', 1, 2, 3],
['Row2', 4, 5, 6],
['Row3', 7, 8, 9],
['Row4', 10, 11, 12],
['Row5', 13, 14, 15]])
df = pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:])
To select the value at the first row and first column:
df.iloc[0, 0]
This will return: 1
Slicing
You can also use slicing when selecting data. For instance:
df.iloc[0:3, 0:2]
This will return the first three rows and the first two columns.
Selecting a Single Column
To select a single column, you simply leave out the row index:
df.iloc[:, 0]
This will return the first column of the DataFrame.
Selecting a Single Row
Alternatively, to select a single row, you can leave out the column index:
df.iloc[1]
This will return the second row of the DataFrame.
Selecting Multiple Rows and Columns
You can select multiple rows and columns by passing multiple index values. For example:
df.iloc[[0, 2, 4], [1, 2]]
This will return the first, third, and fifth rows, and the second and third columns.
Boolean Indexing with iloc
iloc
also supports boolean indexing. However, it requires a boolean array with the same length as the index. For example:
df.iloc[[True, False, True, False, True]]
This will return the first, third, and fifth rows.
Conclusion
The iloc
method provides a flexible and powerful way to index and select data based on its position within the DataFrame. It's a fundamental tool for data manipulation in pandas, and understanding how to use it effectively can greatly speed up your data analysis process.