Skip to main content

Viewing and Inspecting Data

Pandas is a powerful library in Python, which provides flexible data structures that make data manipulation and analysis more straightforward. This tutorial will guide you through viewing and inspecting data using Pandas.

Getting Started

Firstly, we need to import the pandas library. We'll also import NumPy for good measure because it can come in handy when dealing with numerical data.

import pandas as pd
import numpy as np

Loading Data

We can read various types of files using Pandas, but for this tutorial, let's use a CSV file as an example.

df = pd.read_csv('your_file.csv')

Replace 'your_file.csv' with the path to the CSV file you want to load.

Viewing Data

Once you have loaded your data into a DataFrame, you might want to view it. The head() and tail() functions allow us to view the first and last rows of the DataFrame, respectively. By default, they return 5 rows, but you can specify a different number as an argument.

df.head() # view first 5 rows
df.tail(3) # view last 3 rows

Inspecting Data

Now that we can view our data, let's inspect it further.

Shape of DataFrame

Use the shape attribute to get the number of rows and columns in your DataFrame.

df.shape

Information about DataFrame

The info() function provides a concise summary of your DataFrame, including the number of non-null entries in each column.

df.info()

Descriptive Statistics

The describe() function provides descriptive statistics of the DataFrame, such as mean, median, standard deviation, minimum, maximum, and quartile values. By default, it only includes numerical columns, but you can include other data types by passing the include parameter.

df.describe() # for numerical columns
df.describe(include='all') # for all columns

Checking Data Types

You can use the dtypes attribute to check the data types of each column.

df.dtypes

Counting Values

The value_counts() function is particularly useful for inspecting categorical columns, as it counts the number of each unique value.

df['column_name'].value_counts()

Replace 'column_name' with the name of the column you want to inspect.

Unique Values

To get unique values in a column, use the unique() function.

df['column_name'].unique()

Conclusion

With these basic functions and methods, you can view and inspect your data in diverse ways. Remember, getting to know your data is one of the most crucial steps in any data analysis task. Happy analyzing!