There are many libraries out there that we can use for our deep learning cravings. And I’ve tried a few of them. When I started, I tried tensorflow. And then I tried Keras. And then i tried PyTorch. And if I’m being completely honest to myself I like PyTorch the most.

I think the reason I like PyTorch much more than Tensorflow, is that PyTorch is very pythonic. It uses the style and powers of python, which makes it much easier and simpler to understand and use. Especially for me, because I code mainly in Python. The problem with Tensorflow is that it requires you to learn a lot of *Tensorflow-specific jargon*. This example is about how you can create a simple neural network in PyTorch.

We’ll create a simple neural network with one hidden layer and a single output unit. We will use the ReLU activation in the hidden layer and the sigmoid activation in the output layer.
First, we need to import the PyTorch library.

In [8]:
import torch
import torch.nn as nn

Then we define the sizes of all the layers and the batch size

In [2]:
# Defining input size, hidden layer size, output size and batch size respectively
n_in, n_h, n_out, batch_size = 10, 5, 1, 10

And now, we create some dummy input data x and some dummy target data y . We use PyTorch Tensors to store this data. PyTorch Tensors can be used and manipulated just like NumPy arrays but with the added benefit that PyTorch tensors can be run on the GPUs. 

In [3]:
# Create dummy input and target tensors (data)
x = torch.randn(batch_size, n_in)
y = torch.tensor([[1.0], [0.0], [0.0], [1.0], [1.0], [1.0], [0.0], [0.0], [1.0], [1.0]]) 

And now, for the main course, we will define our model in one line of code.

In [4]:
# Create a model
model = nn.Sequential(nn.Linear(n_in, n_h),
                     nn.ReLU(),
                     nn.Linear(n_h, n_out),
                     nn.Sigmoid())

This creates a model that looks like input -> linear -> relu -> linear -> sigmoid. There is another way to define our models which is used to define more complicated and custom models. It is done by defining our model in a class. You can read about it [here](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-custom-nn-modules).

Now, it is time to construct our loss function. We will use the Mean Squared Error Loss.

Also, don’t forget to define our optimizer. We use Stochastic Gradient Descent in this one and a learning rate of 0.01. model.parameters() returns an iterator over our model’s parameters (weights and biases).

In [5]:
# Construct the loss function
criterion = torch.nn.MSELoss()

# Construct the optimizer (Stochastic Gradient Descent in this case)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

And now, for the dessert, we run our Gradient Descent for 50 epochs. This does the forward propagation, loss computation, backward propagation and parameter updation in that sequence.

In [6]:
# Gradient Descent
for epoch in range(50):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print('epoch: ', epoch,' loss: ', loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    
    # perform a backward pass (backpropagation)
    loss.backward()
    
    # Update the parameters
    optimizer.step()

epoch:  0  loss:  0.23753903806209564
epoch:  1  loss:  0.23736728727817535
epoch:  2  loss:  0.2371959239244461
epoch:  3  loss:  0.23702484369277954
epoch:  4  loss:  0.2368541806936264
epoch:  5  loss:  0.23668383061885834
epoch:  6  loss:  0.23651380836963654
epoch:  7  loss:  0.2363440990447998
epoch:  8  loss:  0.23617476224899292
epoch:  9  loss:  0.2360057532787323
epoch:  10  loss:  0.23583707213401794
epoch:  11  loss:  0.23566871881484985
epoch:  12  loss:  0.23550067842006683
epoch:  13  loss:  0.23533299565315247
epoch:  14  loss:  0.23516564071178436
epoch:  15  loss:  0.23499858379364014
epoch:  16  loss:  0.23483186960220337
epoch:  17  loss:  0.23466545343399048
epoch:  18  loss:  0.23449937999248505
epoch:  19  loss:  0.2343336045742035
epoch:  20  loss:  0.2341681569814682
epoch:  21  loss:  0.2340030074119568
epoch:  22  loss:  0.23383818566799164
epoch:  23  loss:  0.23367366194725037
epoch:  24  loss:  0.23350946605205536
epoch:  25  loss:  0.23334555327892303
epo

*y_pred* gets the predicted values from a forward pass of our model. We pass this, along with target values y to the criterion which calculates the loss. Then, *optimizer.zero_grad()* zeroes out all the gradients. We need to do this so that previous gradients don’t keep on accumulating. Then, *loss.backward()* is the main PyTorch magic that uses PyTorch’s Autograd feature. Autograd computes all the gradients w.r.t. all the parameters automatically based on the computation graph that it creates dynamically. Basically, this does the backward pass (backpropagation)of gradient descent. Finally, we call *optimizer.step()* which does a single updation of all the parameters using the new gradients.

And that’s it. We have successfully trained a simple two-layer neural network in PyTorch and we didn’t really have to go through a ton of random jargon to do it. PyTorch keeps it sweet and simple, just the way everyone likes it.

If you want to learn more about PyTorch and want to dive deeper into it, take a look at PyTorch’s official documentation and tutorials. They are really well-written. You can find them [here](https://pytorch.org/tutorials/) 