Skip to main content

Creating DataFrames from scratch

Introduction

Pandas is a powerful Python library for data analysis. It provides flexible, efficient data structures that make working with structured (tabular, multidimensional, potentially heterogeneous) data both easy and intuitive. One of the primary structures in Pandas is the DataFrame. In this tutorial, we will learn how to create Pandas DataFrames from scratch.


1. Importing Pandas

Before we start, we need to import the pandas library. We can import it as follows:

import pandas as pd

2. Creating a DataFrame from a Dictionary

The simplest way to create a DataFrame is from a dictionary. In Python, a dictionary is a collection of key-value pairs. In a Pandas DataFrame, keys become column names, and values become data for those columns.

data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']
}

df = pd.DataFrame(data)

When we print df, it will display a nicely formatted table with 'Name', 'Age', and 'City' as columns and their corresponding values.


3. Creating a DataFrame from a List of Lists

We can also create a DataFrame from a list of lists, where each nested list is a row of data.

data = [
['John', 28, 'New York'],
['Anna', 24, 'Paris'],
['Peter', 35, 'Berlin'],
['Linda', 32, 'London']
]

df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])

In this case, we need to provide column names separately using the columns parameter.


4. Creating a DataFrame from a CSV File

Pandas also allows us to create a DataFrame directly from a CSV file.

df = pd.read_csv('your_file.csv')

5. Creating a DataFrame from an Excel File

Similarly, an Excel file can be read into a DataFrame.

df = pd.read_excel('your_file.xlsx')

6. Creating an Empty DataFrame

Sometimes, it's useful to start with an empty DataFrame and add data to it later.

df = pd.DataFrame()

7. Viewing Your DataFrame

Once you have created a DataFrame, you can view it by simply typing its name and hitting enter. However, for large DataFrames, it's better to use the head() or tail() methods, which display the first or last 5 rows, respectively.

df.head()  # displays the first 5 rows
df.tail() # displays the last 5 rows

Conclusion

Creating DataFrames is one of the first steps in using Pandas for data analysis. Once you have your data in a DataFrame, you can perform a wide array of operations on it, including filtering, aggregation, data transformation, and much more.

Remember, practice is the key to mastery. So, make sure to write the codes yourself and play around with them to get a better understanding. Happy learning!