Skip to main content

Writing Custom Datasets, DataLoaders and Transforms

In PyTorch, one of the fundamental components for training a model is the dataset. PyTorch provides various tools and libraries to assist you with this task. In this tutorial, we'll explore how to create custom Datasets, DataLoaders, and Transforms.

Custom Datasets

A dataset in PyTorch is represented by a class that extends the torch.utils.data.Dataset abstract class. The Dataset class has two necessary functions: __len__ and __getitem__.

from torch.utils.data import Dataset

class MyCustomDataset(Dataset):
def __init__(self, ...):
# initialization logic
pass

def __len__(self):
# return the size of the dataset
return size

def __getitem__(self, idx):
# get and return data and label at a given index
return data, label

Custom DataLoader

DataLoader provides a convenient way to iterate over the dataset in batches. It also provides features like shuffling and loading data in parallel.

from torch.utils.data import DataLoader

data_loader = DataLoader(MyCustomDataset(...), batch_size=32, shuffle=True, num_workers=4)

Custom Transforms

Transforms are useful for data augmentation, which can improve the performance of your model. PyTorch provides a torchvision.transforms module that has many pre-built transformations, but you can also create your own.

from torchvision import transforms

class MyTransform:
def __call__(self, x):
# transform the input and return
return x_transformed

custom_transform = transforms.Compose([
MyTransform(),
...
])

Applying Transforms to Datasets

When creating your custom dataset, you can apply these transformations.

class MyCustomDataset(Dataset):
def __init__(self, ..., transform=None):
...
self.transform = transform

def __getitem__(self, idx):
...
if self.transform:
data = self.transform(data)
return data, label

Then, when creating the dataset, you can pass in the composed transformations.

dataset = MyCustomDataset(..., transform=custom_transform)

Conclusion

In this tutorial, we've learned how to create custom Datasets, DataLoaders, and Transforms in PyTorch. These are fundamental concepts in PyTorch, and understanding them allows you to utilize the library to its full potential.


I hope this helps. Let me know if you have any queries!