Skip to main content

Loading datasets into DataFrame

Pandas is an open-source Python library that provides flexible data structures and data analysis tools. One of the primary data structures in Pandas is the DataFrame. It is a two-dimensional labeled data structure with columns that can be of different types. In this tutorial, we will discuss how to load datasets into a DataFrame.

## Importing the Required Libraries

Before we dive into loading datasets, let's import the necessary libraries. In our case, we need to import Pandas.

```python
import pandas as pd

Loading a CSV File

CSV (Comma Separated Values) is one of the most common formats of a dataset. Let's learn how to load a CSV file into a DataFrame.

df = pd.read_csv('file.csv')

In the above code, we use the read_csv() function from pandas to load the CSV file. The argument to this function is the path to the CSV file. The function returns a DataFrame that we store in the variable df.

Loading an Excel File

Pandas also allows you to load Excel files with the read_excel() function.

df = pd.read_excel('file.xlsx')

Similar to read_csv(), read_excel() reads an Excel file and returns a DataFrame.

Loading a JSON File

Many modern APIs return data in JSON format. To load a JSON file, use the read_json() function.

df = pd.read_json('file.json')

Loading Data from a SQL Database

Pandas can also load data from SQL databases. This requires a connection to the database.

from sqlalchemy import create_engine

engine = create_engine('sqlite:///database.db')
df = pd.read_sql_query("SELECT * FROM table", engine)

In this example, we first create a connection to the SQLite database database.db using SQLAlchemy, an SQL toolkit for Python. We then use the read_sql_query() function to execute a SQL query and load the result into a DataFrame.

Viewing the DataFrame

After loading the dataset into a DataFrame, you can view the first five rows using the head() function.

df.head()

Conclusion

In this tutorial, we learned how to load datasets from various sources into a DataFrame using pandas. This is the first step in data manipulation with pandas. The next steps involve cleaning and processing the data, which we will discuss in subsequent tutorials. Remember, the key to mastering pandas is practice. So, try loading different datasets and exploring them.