Pivoting DataFrames
Pivoting is a common data transformation operation that can be used to reshape your data. It is particularly useful when your data is in a "long" format and you want to pivot it into a "wide" format for a different view or analysis.
Let's start by importing necessary libraries.
import pandas as pd
import numpy as np
Creating a DataFrame
First, let's create a DataFrame that we can use for our exercises. We'll create a DataFrame that represents sales data for a store.
data = {
'date': pd.date_range(start='01-01-2020', periods=6),
'product': ['apple', 'banana', 'apple', 'banana', 'apple', 'banana'],
'sales': [10, 15, 9, 16, 12, 17],
'profit': [2.5, 3.5, 2.2, 3.6, 2.7, 3.8]
}
df = pd.DataFrame(data)
Using pivot
method
The pivot
method is the basic method for pivoting with indexing. The syntax of the pivot
function is as follows.
df.pivot(index=None, columns=None, values=None)
Let's pivot our data on the product
column to get the sales for each product.
df_pivot = df.pivot(index='date', columns='product', values='sales')
The pivot
function takes three arguments: index
, columns
, and values
. The index
is the column to use to make new frame’s index. The columns
are the columns to make new frame’s columns. The values
are the columns to use for populating new frame’s values.
Using pivot_table
method
Sometimes, you might have duplicate entries in your data and you want to aggregate them in some way. This is where the pivot_table
method comes in handy.
df.pivot_table(index=None, columns=None, values=None, aggfunc='mean')
Let's try to use pivot_table
with 'mean' as the aggregation function.
df_pivot_table = df.pivot_table(index='date', columns='product', values='sales', aggfunc='mean')
Handling Missing Data
Pandas pivot
automatically handles NaN values and excludes them from the result set. But, if you want to fill them with a specific value, you can use the fillna
function.
df_pivot_fill = df_pivot.fillna(0)
Reshaping Data with melt
The melt
function is useful to transform a DataFrame from wide format to long format. The function is flexible and allows to specify which columns to keep as identifier variables.
df_melt = df.melt(id_vars=['date', 'product'], value_vars=['sales', 'profit'])
Here, id_vars
are the columns to use as identifier variables, and value_vars
are the columns to unpivot.
In this tutorial, we have discussed various ways of pivoting and reshaping data in pandas. These methods are very powerful and can be used to transform your data to suit your analysis needs.