Skip to main content

Autograd: Automatic Differentiation

Autograd: Automatic Differentiation

Hello, budding AI enthusiasts! In this tutorial, we will be exploring one of the most powerful features of PyTorch: autograd, the automatic differentiation library. This is what allows neural networks to learn from data and improve over time. Let's get started!

What is Automatic Differentiation?

In the field of artificial intelligence, we often need to compute gradients to optimize model parameters. Automatic differentiation is a technique that computes the derivatives of functions efficiently, and PyTorch uses a version of this called reverse mode automatic differentiation.

The Autograd Package

PyTorch's autograd package provides automatic differentiation for all operations on Tensors. It's a define-by-run framework, which means your backpropagation is defined by how your code runs, and every single iteration can be different.

How Does Autograd Work?

Let's illustrate this with some code:

import torch

# Create a tensor and set requires_grad=True to track computation with it
x = torch.ones(2, 2, requires_grad=True)
print(x)

requires_grad=True indicates that we want to compute gradients with respect to this tensor during the backward pass. Now let's perform some tensor operations:

y = x + 2
print(y)

y was created as a result of an operation, so it has a grad_fn.

print(y.grad_fn)

Let's do more operations on y:

z = y * y * 3
out = z.mean()

print(z, out)

Gradients

Let's backprop now. Because out contains a single scalar, out.backward() is equivalent to out.backward(torch.tensor(1.)).

out.backward()

Print gradients d(out)/dx:

print(x.grad)

You should have got a matrix of 4.5. Let's call the out Tensor "o". We have that o = 1/4 ∑zi, zi = 3(xi+2)² and zi|(xi=1) = 3(3)² = 27. Therefore, ∂o/∂xi = 3/2(xi+2), hence ∂o/∂xi|(xi=1) = 9/2 = 4.5.

Things to Remember about Autograd

  • If you want to stop a tensor from tracking history, you can call .detach() to detach it from the computation history, and to prevent future computation from being tracked.

  • To prevent tracking history (and using memory), you can also wrap the code block in with torch.no_grad():. This can be particularly helpful when evaluating a model because the model may have trainable parameters with requires_grad=True, but we don't need the gradients.

  • Another important class for autograd implementation is Function.

  • Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor (except for Tensors created by the user - their grad_fn is None).

  • If you want to compute the derivatives, you can call .backward() on a Tensor. If Tensor is a scalar (i.e., it holds a one element data), you don’t need to specify any arguments to backward(), however, if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.

And that's a wrap! We hope you enjoyed this introduction to Autograd and have a better understanding of how PyTorch calculates gradients. Happy learning!