Skip to main content

Pivoting DataFrames

Pivoting is a common data transformation operation that can be used to reshape your data. It is particularly useful when your data is in a "long" format and you want to pivot it into a "wide" format for a different view or analysis.

Let's start by importing necessary libraries.

import pandas as pd
import numpy as np

Creating a DataFrame

First, let's create a DataFrame that we can use for our exercises. We'll create a DataFrame that represents sales data for a store.

data = {
'date': pd.date_range(start='01-01-2020', periods=6),
'product': ['apple', 'banana', 'apple', 'banana', 'apple', 'banana'],
'sales': [10, 15, 9, 16, 12, 17],
'profit': [2.5, 3.5, 2.2, 3.6, 2.7, 3.8]
}
df = pd.DataFrame(data)

Using pivot method

The pivot method is the basic method for pivoting with indexing. The syntax of the pivot function is as follows.

df.pivot(index=None, columns=None, values=None)

Let's pivot our data on the product column to get the sales for each product.

df_pivot = df.pivot(index='date', columns='product', values='sales')

The pivot function takes three arguments: index, columns, and values. The index is the column to use to make new frame’s index. The columns are the columns to make new frame’s columns. The values are the columns to use for populating new frame’s values.

Using pivot_table method

Sometimes, you might have duplicate entries in your data and you want to aggregate them in some way. This is where the pivot_table method comes in handy.

df.pivot_table(index=None, columns=None, values=None, aggfunc='mean')

Let's try to use pivot_table with 'mean' as the aggregation function.

df_pivot_table = df.pivot_table(index='date', columns='product', values='sales', aggfunc='mean')

Handling Missing Data

Pandas pivot automatically handles NaN values and excludes them from the result set. But, if you want to fill them with a specific value, you can use the fillna function.

df_pivot_fill = df_pivot.fillna(0)

Reshaping Data with melt

The melt function is useful to transform a DataFrame from wide format to long format. The function is flexible and allows to specify which columns to keep as identifier variables.

df_melt = df.melt(id_vars=['date', 'product'], value_vars=['sales', 'profit'])

Here, id_vars are the columns to use as identifier variables, and value_vars are the columns to unpivot.

In this tutorial, we have discussed various ways of pivoting and reshaping data in pandas. These methods are very powerful and can be used to transform your data to suit your analysis needs.