Skip to main content

Histogram

Introduction

Histograms are an excellent tool for visualizing the distribution of your data. They are graphical representations that organize a group of data points into a specified range. In this tutorial, we'll learn how to create histograms using the Pandas library in Python.

Getting Started

First, we need to import the necessary libraries.

import pandas as pd
import matplotlib.pyplot as plt

Loading Data

We'll use a simple data set for this tutorial, the 'Iris' data set, which is included in the seaborn library. This data set includes measurements for 150 iris flowers from three different species.

import seaborn as sns

iris = sns.load_dataset('iris')

You can view the first few rows of this DataFrame using the head() function.

print(iris.head())

Creating a Histogram

In Pandas, we can create a histogram using the plot.hist() function. Let's try it out on the 'sepal_width' column of our iris DataFrame.

iris['sepal_width'].plot.hist()
plt.show()

This will create a simple histogram. The x-axis represents the data ranges, and the y-axis represents the frequency of data points in each range.

Customizing Histogram

Pandas allows us to customize our histograms. We can change the number of bins, the color of the bins, and add labels to our axes.

Changing Bin Number

The bin number is the number of rectangles in the histogram. You can change it by setting the bins parameter in the plot.hist() function.

iris['sepal_width'].plot.hist(bins=20)
plt.show()

Changing Color

Set the color parameter in the plot.hist() function to change the color of the bins.

iris['sepal_width'].plot.hist(bins=20, color='skyblue')
plt.show()

Adding Labels

You can label your axes using plt.xlabel() and plt.ylabel() functions.

iris['sepal_width'].plot.hist(bins=20, color='skyblue')
plt.xlabel('Sepal Width')
plt.ylabel('Frequency')
plt.show()

Conclusion

Histograms are powerful tools for visualizing data distributions. With the help of Pandas, we can easily create and customize histograms to better understand our data. Practice with different data sets and customization options to become more proficient in creating histograms with Pandas.