Autograd: Automatic Differentiation
Autograd: Automatic Differentiation
Hello, budding AI enthusiasts! In this tutorial, we will be exploring one of the most powerful features of PyTorch: autograd
, the automatic differentiation library. This is what allows neural networks to learn from data and improve over time. Let's get started!
What is Automatic Differentiation?
In the field of artificial intelligence, we often need to compute gradients to optimize model parameters. Automatic differentiation is a technique that computes the derivatives of functions efficiently, and PyTorch uses a version of this called reverse mode automatic differentiation.
The Autograd Package
PyTorch's autograd
package provides automatic differentiation for all operations on Tensors. It's a define-by-run framework, which means your backpropagation is defined by how your code runs, and every single iteration can be different.
How Does Autograd Work?
Let's illustrate this with some code:
import torch
# Create a tensor and set requires_grad=True to track computation with it
x = torch.ones(2, 2, requires_grad=True)
print(x)
requires_grad=True
indicates that we want to compute gradients with respect to this tensor during the backward pass. Now let's perform some tensor operations:
y = x + 2
print(y)
y
was created as a result of an operation, so it has a grad_fn
.
print(y.grad_fn)
Let's do more operations on y
:
z = y * y * 3
out = z.mean()
print(z, out)
Gradients
Let's backprop now. Because out
contains a single scalar, out.backward()
is equivalent to out.backward(torch.tensor(1.))
.
out.backward()
Print gradients d(out)/dx:
print(x.grad)
You should have got a matrix of 4.5
. Let's call the out
Tensor "o". We have that o = 1/4 ∑zi
, zi = 3(xi+2)²
and zi|(xi=1) = 3(3)² = 27
. Therefore, ∂o/∂xi = 3/2(xi+2)
, hence ∂o/∂xi|(xi=1) = 9/2 = 4.5
.
Things to Remember about Autograd
If you want to stop a tensor from tracking history, you can call
.detach()
to detach it from the computation history, and to prevent future computation from being tracked.To prevent tracking history (and using memory), you can also wrap the code block in
with torch.no_grad():
. This can be particularly helpful when evaluating a model because the model may have trainable parameters withrequires_grad=True
, but we don't need the gradients.Another important class for autograd implementation is
Function
.Tensor
andFunction
are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a.grad_fn
attribute that references aFunction
that has created theTensor
(except for Tensors created by the user - theirgrad_fn is None
).If you want to compute the derivatives, you can call
.backward()
on aTensor
. IfTensor
is a scalar (i.e., it holds a one element data), you don’t need to specify any arguments tobackward()
, however, if it has more elements, you need to specify agradient
argument that is a tensor of matching shape.
And that's a wrap! We hope you enjoyed this introduction to Autograd and have a better understanding of how PyTorch calculates gradients. Happy learning!