This is based on code from the following book

``````%matplotlib inline
import numpy as np
import torch
torch.set_printoptions(edgeitems=2)
``````

Taking our input from the previous notebook and applying our scaling

``````t_c = torch.tensor([0.5, 14.0, 15.0, 28.0, 11.0, 8.0,
3.0, -4.0, 6.0, 13.0, 21.0])
t_u = torch.tensor([35.7, 55.9, 58.2, 81.9, 56.3, 48.9,
33.9, 21.8, 48.4, 60.4, 68.4])
t_un = 0.1 * t_u
``````

Same model and loss function as before.

``````def model(t_u, w, b):
return w * t_u + b
``````
``````def loss_fn(t_p, t_c):
squared_diffs = (t_p - t_c)**2
return squared_diffs.mean()
``````

This time instead of keeping track of our parameters and applying the gradient with respect to the parameters we’ll leverage `torch`’s auto gradient feature.

``````params = torch.tensor([1.0, 0.0], requires_grad=True)
``````

How does `requires_grad` work?

Internally, autograd represents this graph as a graph of Function objects (really expressions), which can be apply() ed to compute the result of evaluating the graph. When computing the forwards pass, autograd simultaneously performs the requested computations and builds up a graph representing the function that computes the gradient (the .grad_fn attribute of each torch.Tensor is an entry point into this graph). When the forwards pass is completed, we evaluate this graph in the backwards pass to compute the gradients. [1]

This can be done as long as our model is differentiable.

Torch will track a graph of operations used to compute our current tensor.

``````params.grad is None
``````
``````True
``````

We apply a single forward and backward pass and can print out the

``````loss = loss_fn(model(t_u, *params), t_c)
loss.backward()

``````
``````tensor([4517.2969,   82.6000])
``````
``````if params.grad is not None:
``````

Notice that we are not ready to perform our `training_loop` and we only had to define our `model` and `loss_fn`.

``````def training_loop(n_epochs, learning_rate, params, t_u, t_c):
for epoch in range(1, n_epochs + 1):
# clears out the accumulated derivatives at the leaf nodes
if params.grad is not None:  # <1>

t_p = model(t_u, *params)
# computes the loss
loss = loss_fn(t_p, t_c)
# accumulate the derivatives at the leaf nodes
loss.backward()

# inplace update of params which autograd does not like
# the pytorch autograd mechanism will not apply in this block to avoid issues

if epoch % 500 == 0:
print('Epoch %d, Loss %f' % (epoch, float(loss)))

return params
``````
``````training_loop(
n_epochs = 5000,
learning_rate = 1e-2,
params = torch.tensor([1.0, 0.0], requires_grad=True), # <1>
t_u = t_un, # <2>
t_c = t_c)
``````
``````params.grad tensor([-0.2252,  1.2748])
Epoch 500, Loss 7.860116
Epoch 1000, Loss 3.828538
Epoch 1500, Loss 3.092191
Epoch 2000, Loss 2.957697
Epoch 2500, Loss 2.933134
Epoch 3000, Loss 2.928648
Epoch 3500, Loss 2.927830
Epoch 4000, Loss 2.927679
Epoch 4500, Loss 2.927652
Epoch 5000, Loss 2.927647