Pandas Data Structures
In this tutorial, we will be exploring the fundamental data structures in Pandas, namely Series
and DataFrames
.
Pandas Series
A Series
is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). It is somewhat similar to a dictionary or a column in a table. Let's look at how to create a series:
import pandas as pd
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)
In this code, we are creating a series s
which contains some numbers and a NaN
which stands for 'Not a Number' and is generally used to represent missing or undefined values.
Pandas DataFrame
While a Series
is a single column of data, a DataFrame
is several columns, one for each variable. In essence, a DataFrame
in pandas is analogous to a table. Here's an example of how to create a DataFrame:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter'],
'Age': [28, 24, 35],
'City': ['New York', 'Paris', 'Berlin']}
df = pd.DataFrame(data)
print(df)
In this code, we are creating a DataFrame df
from a dictionary data
. The keys of the dictionary become the column names and the values become the data in those columns.
Accessing Data
You can access data in Series and DataFrames using the index. Here's how:
# Accessing data in Series
print(s[2]) # prints: 5
# Accessing data in DataFrame
print(df['Name']) # prints: 0 John, 1 Anna, 2 Peter
Data Analysis with DataFrame
DataFrames come with built-in functions for simple data analysis. Here are a few examples:
# Get a quick statistic summary of your data
df.describe()
# Sort by age
df.sort_values(by='Age')
The describe()
function provides a statistical summary of all numerical columns. The sort_values()
function sorts the DataFrame by a specified column.
Conclusion
Pandas provides powerful and flexible data structures that make data manipulation and analysis easy. Series
and DataFrames
form the basic building blocks of data manipulation in pandas.
In this tutorial, we have covered the basics of Series and DataFrames, how to create them, access data within them and perform simple data analysis. There's much more to explore and as you dive deeper you'll find pandas to be an invaluable tool in your data analysis toolkit.
Happy coding!