Skip to main content

Pandas Data Structures


In this tutorial, we will be exploring the fundamental data structures in Pandas, namely Series and DataFrames.

Pandas Series

A Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). It is somewhat similar to a dictionary or a column in a table. Let's look at how to create a series:

import pandas as pd

s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)

In this code, we are creating a series s which contains some numbers and a NaN which stands for 'Not a Number' and is generally used to represent missing or undefined values.

Pandas DataFrame

While a Series is a single column of data, a DataFrame is several columns, one for each variable. In essence, a DataFrame in pandas is analogous to a table. Here's an example of how to create a DataFrame:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter'],
'Age': [28, 24, 35],
'City': ['New York', 'Paris', 'Berlin']}
df = pd.DataFrame(data)

print(df)

In this code, we are creating a DataFrame df from a dictionary data. The keys of the dictionary become the column names and the values become the data in those columns.

Accessing Data

You can access data in Series and DataFrames using the index. Here's how:

# Accessing data in Series
print(s[2]) # prints: 5

# Accessing data in DataFrame
print(df['Name']) # prints: 0 John, 1 Anna, 2 Peter

Data Analysis with DataFrame

DataFrames come with built-in functions for simple data analysis. Here are a few examples:

# Get a quick statistic summary of your data
df.describe()

# Sort by age
df.sort_values(by='Age')

The describe() function provides a statistical summary of all numerical columns. The sort_values() function sorts the DataFrame by a specified column.

Conclusion

Pandas provides powerful and flexible data structures that make data manipulation and analysis easy. Series and DataFrames form the basic building blocks of data manipulation in pandas.

In this tutorial, we have covered the basics of Series and DataFrames, how to create them, access data within them and perform simple data analysis. There's much more to explore and as you dive deeper you'll find pandas to be an invaluable tool in your data analysis toolkit.

Happy coding!