Skip to main content

Joining DataFrames

## Introduction

Pandas is an open-source, BSD-licensed Python library that provides high-performance, easy-to-use data structures, and data analysis tools. One of the most important data structures in pandas is the DataFrame. In this tutorial, we'll explore how to join DataFrames in Pandas.

## Understanding Joining

Joining is a method to combine two differently indexed DataFrames into one, based on a common attribute or column. It's similar to the JOIN operation in SQL. Pandas provides several methods for joining, including `merge()`, `join()`, and `concat()`.

## DataFrame.merge()

The `merge()` function is used to merge DataFrames on a key. Let's start with a simple example.

```python
import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'key': ['K0', 'K1', 'K2', 'K3']
})

df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3'],
'key': ['K0', 'K1', 'K2', 'K3']
})

# Merging the dataframes
df3 = pd.merge(df1, df2, on='key')
print(df3)

In the above code, the merge() function merges df1 and df2 on the column 'key'.

DataFrame.join()

join() is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single DataFrame. Let's see how join() works.

import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=['K0', 'K1', 'K2'])

df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=['K0', 'K2', 'K3'])

# Joining the dataframes
df3 = df1.join(df2)
print(df3)

In the above code, join() combines df1 and df2. The resulting DataFrame df3 contains all rows from df1 and the matching rows from df2. If there are no matching rows, NaN will be added.

DataFrame.concat()

The concat() function is used to append one or more DataFrame objects along a particular axis (either rows or columns).

import pandas as pd

# Creating two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})

# Concatenating the dataframes
df3 = pd.concat([df1, df2])
print(df3)

In the above code, concat() concatenates df1 and df2 along the row axis.

Conclusion

Joining or combining data is an integral part of data manipulation and pandas provides various ways to do this. It is important to understand which method to use when dealing with different data scenarios. We've covered the basic join operations, but there's more to explore in pandas. Practice these operations and explore the pandas documentation for more.