Skip to main content

Image Classification

Introduction

Image classification is a fundamental task in computer vision that classifies an image into one of many predefined categories. This article will give you a step-by-step guide on how to build an image classification model using PyTorch, one of the most popular deep learning libraries.

Prerequisites

Before we begin, ensure you have PyTorch installed. If not, please follow the official PyTorch installation guide.

Step 1: Importing Libraries

First, let's import the necessary libraries:

import torch
from torchvision import datasets, models, transforms
import torch.nn as nn
import torch.optim as optim

Step 2: Preparing the Dataset

We will use the CIFAR-10 dataset, a popular image classification dataset of colored 32x32 pixel images across 10 classes.

transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

trainset = datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)

testset = datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Step 3: Define the Network

Next, let's define a simple Convolutional Neural Network (CNN):

class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

This network has two convolutional layers, followed by two fully connected layers.

Step 4: Define a Loss Function and Optimizer

We'll use the Cross-Entropy loss and SGD with momentum.

net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Step 5: Train the Network

The training process involves passing the input data through the network (forward pass), calculating the loss, then updating the weights of the network in the direction that minimizes the loss (backward pass).

for epoch in range(2):  # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()

running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0

print('Finished Training')

Step 6: Test the Network

Finally, let's test our model with the test data.

correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))

This is a simple yet complete guide to training an image classification model using PyTorch. Happy learning!