Data Visualization with matplotlib and seaborn
Introduction
Data visualization is an essential part of data analysis. It allows us to visually assess the data, identify patterns, outliers and relationships between variables. Python offers several libraries for data visualization, but we will focus on two of the most powerful and popular ones: Matplotlib and Seaborn.
What is Matplotlib?
Matplotlib is a multi-platform, data visualization library built on NumPy arrays and designed to work with the broader SciPy stack. It was created by John D. Hunter in 2003 as a way of providing a plotting capability similar to that of MATLAB, which is another programming language that provides an integrated data plotting function.
What is Seaborn?
Seaborn is a Python data visualization library based on Matplotlib. It is used for producing informative and attractive statistical graphics. While Matplotlib tries to make easy things easy and hard things possible, Seaborn tries to make a well-defined set of hard things easy too.
Installation
Before we can use Matplotlib and Seaborn, we need to install them. This can be done using pip:
pip install matplotlib seaborn
Matplotlib basics
To start with, let's import Matplotlib and create a simple line plot.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))
plt.show()
The plot
function creates the plot, and show
function displays it. We can add labels, title and legend to the plot.
plt.plot(x, np.sin(x), '-g', label='sin(x)')
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.title('Sinusoidal curve')
plt.legend()
plt.show()
Seaborn basics
Seaborn provides a high-level interface to Matplotlib. It uses fewer syntax and has stunning default themes and a high-level interface for customizing plots. Here is how we can create a histogram in Seaborn.
import seaborn as sns
import pandas as pd
data = pd.DataFrame({'x': np.random.normal(size=100)})
sns.histplot(data.x, kde=True)
plt.show()
Combining Matplotlib and Seaborn
While Seaborn simplifies data visualization, sometimes we might need the power and flexibility of Matplotlib. Fortunately, we can use both libraries together.
tips = sns.load_dataset("tips")
sns.boxplot(x=tips["total_bill"])
plt.title('Box plot of total bill')
plt.show()
In this code, we are using Seaborn's boxplot
function to create the box plot, and Matplotlib's title
function to add a title to the plot.
Conclusion
Data visualization is a powerful tool for data analysis. With libraries like Matplotlib and Seaborn, Python makes it easier to create informative and attractive plots. Whether you're a beginner or an experienced data scientist, mastering these libraries will significantly boost your data analysis skills.