Skip to main content

Selection by Position

Selection by Position

When working with a pandas DataFrame, it is often necessary to select data based on its position in the DataFrame. This is where pandas 'Selection by Position' comes in. This method allows you to select data from specific rows and columns by their numerical index.

Basics

To begin, we need to understand the basic method for positional indexing, iloc[]. This method allows us to select data from a DataFrame by integer location only.

Let's consider a DataFrame df with 5 rows and 3 columns:

import pandas as pd
import numpy as np

data = np.array([['', 'Col1', 'Col2', 'Col3'],
['Row1', 1, 2, 3],
['Row2', 4, 5, 6],
['Row3', 7, 8, 9],
['Row4', 10, 11, 12],
['Row5', 13, 14, 15]])

df = pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:])

To select the value at the first row and first column:

df.iloc[0, 0]

This will return: 1

Slicing

You can also use slicing when selecting data. For instance:

df.iloc[0:3, 0:2]

This will return the first three rows and the first two columns.

Selecting a Single Column

To select a single column, you simply leave out the row index:

df.iloc[:, 0]

This will return the first column of the DataFrame.

Selecting a Single Row

Alternatively, to select a single row, you can leave out the column index:

df.iloc[1]

This will return the second row of the DataFrame.

Selecting Multiple Rows and Columns

You can select multiple rows and columns by passing multiple index values. For example:

df.iloc[[0, 2, 4], [1, 2]]

This will return the first, third, and fifth rows, and the second and third columns.

Boolean Indexing with iloc

iloc also supports boolean indexing. However, it requires a boolean array with the same length as the index. For example:

df.iloc[[True, False, True, False, True]]

This will return the first, third, and fifth rows.

Conclusion

The iloc method provides a flexible and powerful way to index and select data based on its position within the DataFrame. It's a fundamental tool for data manipulation in pandas, and understanding how to use it effectively can greatly speed up your data analysis process.