Creating DataFrames from scratch
Introduction
Pandas is a powerful Python library for data analysis. It provides flexible, efficient data structures that make working with structured (tabular, multidimensional, potentially heterogeneous) data both easy and intuitive. One of the primary structures in Pandas is the DataFrame. In this tutorial, we will learn how to create Pandas DataFrames from scratch.
1. Importing Pandas
Before we start, we need to import the pandas library. We can import it as follows:
import pandas as pd
2. Creating a DataFrame from a Dictionary
The simplest way to create a DataFrame is from a dictionary. In Python, a dictionary is a collection of key-value pairs. In a Pandas DataFrame, keys become column names, and values become data for those columns.
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
When we print df
, it will display a nicely formatted table with 'Name', 'Age', and 'City' as columns and their corresponding values.
3. Creating a DataFrame from a List of Lists
We can also create a DataFrame from a list of lists, where each nested list is a row of data.
data = [
['John', 28, 'New York'],
['Anna', 24, 'Paris'],
['Peter', 35, 'Berlin'],
['Linda', 32, 'London']
]
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
In this case, we need to provide column names separately using the columns
parameter.
4. Creating a DataFrame from a CSV File
Pandas also allows us to create a DataFrame directly from a CSV file.
df = pd.read_csv('your_file.csv')
5. Creating a DataFrame from an Excel File
Similarly, an Excel file can be read into a DataFrame.
df = pd.read_excel('your_file.xlsx')
6. Creating an Empty DataFrame
Sometimes, it's useful to start with an empty DataFrame and add data to it later.
df = pd.DataFrame()
7. Viewing Your DataFrame
Once you have created a DataFrame, you can view it by simply typing its name and hitting enter. However, for large DataFrames, it's better to use the head()
or tail()
methods, which display the first or last 5 rows, respectively.
df.head() # displays the first 5 rows
df.tail() # displays the last 5 rows
Conclusion
Creating DataFrames is one of the first steps in using Pandas for data analysis. Once you have your data in a DataFrame, you can perform a wide array of operations on it, including filtering, aggregation, data transformation, and much more.
Remember, practice is the key to mastery. So, make sure to write the codes yourself and play around with them to get a better understanding. Happy learning!