Skip to main content

Data Visualization with matplotlib and seaborn

Introduction

Data visualization is an essential part of data analysis. It allows us to visually assess the data, identify patterns, outliers and relationships between variables. Python offers several libraries for data visualization, but we will focus on two of the most powerful and popular ones: Matplotlib and Seaborn.

What is Matplotlib?

Matplotlib is a multi-platform, data visualization library built on NumPy arrays and designed to work with the broader SciPy stack. It was created by John D. Hunter in 2003 as a way of providing a plotting capability similar to that of MATLAB, which is another programming language that provides an integrated data plotting function.

What is Seaborn?

Seaborn is a Python data visualization library based on Matplotlib. It is used for producing informative and attractive statistical graphics. While Matplotlib tries to make easy things easy and hard things possible, Seaborn tries to make a well-defined set of hard things easy too.

Installation

Before we can use Matplotlib and Seaborn, we need to install them. This can be done using pip:

pip install matplotlib seaborn

Matplotlib basics

To start with, let's import Matplotlib and create a simple line plot.

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.show()

The plot function creates the plot, and show function displays it. We can add labels, title and legend to the plot.

plt.plot(x, np.sin(x), '-g', label='sin(x)')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title('Sinusoidal curve')
plt.legend()
plt.show()

Seaborn basics

Seaborn provides a high-level interface to Matplotlib. It uses fewer syntax and has stunning default themes and a high-level interface for customizing plots. Here is how we can create a histogram in Seaborn.

import seaborn as sns
import pandas as pd

data = pd.DataFrame({'x': np.random.normal(size=100)})
sns.histplot(data.x, kde=True)
plt.show()

Combining Matplotlib and Seaborn

While Seaborn simplifies data visualization, sometimes we might need the power and flexibility of Matplotlib. Fortunately, we can use both libraries together.

tips = sns.load_dataset("tips")
sns.boxplot(x=tips["total_bill"])
plt.title('Box plot of total bill')
plt.show()

In this code, we are using Seaborn's boxplot function to create the box plot, and Matplotlib's title function to add a title to the plot.

Conclusion

Data visualization is a powerful tool for data analysis. With libraries like Matplotlib and Seaborn, Python makes it easier to create informative and attractive plots. Whether you're a beginner or an experienced data scientist, mastering these libraries will significantly boost your data analysis skills.