<a href="https://colab.research.google.com/github/wszdexdrf/self-driving-demo/blob/main/Driving.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Libraries

First we import all the libraries needed. We would be creating the model using pytorch. We would also need pandas to handle csv files. We use opencv for image processing. We store the loaded numpy arrays as a hdf5 file since we would have to run the model on the entire clip, as we have RNNs, so we cannot actually hold the entire image dataset in the System memory. 

In [1]:
import numpy as np
import pandas as pd 
import os
import math
import cv2
import h5py

num_clips = 20
base = 'Data\\'
hfile = h5py.File(base + 'data.hdf5', 'a')

# Loading the Dataset

Now we have to load the dataset. As mentioned earlier, the dataset consists of 20 clips, each having the camera feed of a car being driven by a human in a simulator. The feed is from three cameras, centre, left and right. But here we are only using the centre camera images. There is a csv file containing the location of the images and the respective state of the car (throttle, steering angle) of the car in that position. 

**Note:** the locations of the images are full paths, not relative paths, therefore to run this part of the code, one needs to adjust them, or place the "data" folder according to the paths in the csv files and just "Find and replace" the username from all the paths at once. This might not actually be needed to train the model since the hdf5 file is already provided.

In [None]:
# The function load_img_steering is used to load all the image_paths and convert it into a numpy array.
# It also loads the steering values, which is the target value for us, and returns it as another numpy 
# array. After this, we have two numpy arrays for each clip, one containing the paths of all images in 
# the clip, and another containing the target values.

def load_img_steering(datadir, df):
  image_path = []
  steering = []
  for i in range(len(df)):
    indexed_data = df.iloc[i]
    center, left, right = indexed_data[0], indexed_data[1], indexed_data[2]
    image_path.append(os.path.join(datadir, center.strip()))
    steering.append(float(indexed_data[3]))
  image_paths = np.asarray(image_path)
  steerings = np.asarray(steering)
  return image_paths, steerings

for i in range(num_clips):

  # The location of each clip is the path to the base variable define above + the clip number. 
  # (Note that the base variable includes a extra "\\") .
  datadir = base + str(i+1)

  # The columns in the csv files are included in the list named "columns".
  columns = ['center', 'left', 'right', 'steering', 'throttle', 'reverse', 'speed']
  
  data = pd.read_csv(os.path.join(datadir, 'driving_log.csv'), names = columns)
  image_paths, steering = load_img_steering(datadir, data)

  # The size of the input dataset would be the length of this 1-D array of image_paths concatenated with the 
  # dimensions of each image, which would be (160,320, 3) in our case. Since we need CHW format, therefore 
  # the final dataset will be of the shape (len, 3, 160, 320).
  size = np.shape(image_paths) + (3, 160, 320)
  imgs = np.zeros(size)
  
  
  # We read the images for each path and store this. Then we store the entire numpy array (len, 3, 160, 320) 
  # as a dataset in our hdf5 file, with the name 1x, 2x, etc. The target values are stored as 1y, 2y, etc. 
  # respectively. This hdf5 file resides on permanent storage, since the entire dataset cannot be stored in memory.
  for j,x in enumerate(image_paths):
    imgs[j] = np.transpose(cv2.imread(x)[60::160,:,:], (2, 0, 1))/255
  xset = hfile.create_dataset(str(i + 1) + 'x', np.shape(imgs), h5py.h5t.STD_U8BE, data = imgs)
  yset = hfile.create_dataset(str(i + 1) + 'y', np.shape(steering), data = steering)

# Training the model

The model consists of 5 consecutive CNN layers and then 2 Dense layers. The output of this sequential model is then given to simple RNN cell. The result is finally passed through a Linear layer which converts the output to a single scalar value. The exact specifications are given below in the code.

In [2]:
import torch
import time

class Driver(torch.nn.Module):
  def __init__(self):
    super(Driver, self).__init__()
    self.net = torch.nn.Sequential(
      torch.nn.Conv2d(3, 24, kernel_size = 5, stride = 2),
      torch.nn.Conv2d(24, 36, kernel_size = 5, stride = 2),
      torch.nn.Conv2d(36, 48, kernel_size = 5, stride = 2),
      torch.nn.Conv2d(48, 64, kernel_size = 3),
      torch.nn.Conv2d(64, 64, kernel_size = 3),
      torch.nn.Flatten(),
      torch.nn.Linear(27456, 100),
      torch.nn.Linear(100, 50)
      )
    self.cell = torch.nn.RNNCell(50, 5, nonlinearity = 'relu')
    self.out_layer = torch.nn.Linear(5, 1)
  
  def forward(self, img, hx):
    net_out = self.net(img)
    hidden = self.cell(net_out, hx)
    return self.out_layer(hidden), hidden

model = Driver()

# We are using Mean Square Error as our loss function, since we have to "regress" 
# to the optimum steering value for the given situation. We are using Adam optimizer.
loss_fn = torch.nn.functional.mse_loss
opt = torch.optim.Adam(model.parameters(), lr = 0.0001)

hfile = h5py.File(base + 'data.hdf5', 'r+')
epochs = 5

for epoch in range(epochs):
  for i in range(num_clips):

    print("Clip ", i+1, "Epoch ", epoch+1)
    t = time.time()
    
    # For each clip in each epoch, we load the images and target values into the 
    # system memory
    images = torch.Tensor(hfile['/' + str(i + 1) + 'x']).float()
    steerings = torch.Tensor(hfile['/' + str(i + 1) + 'y']).float()

    # Printing the time spent in loading the clip into memory
    print("Reading ", time.time() - t)

    num_images = int(images.shape[0])
    
    # Initializind the hidden state
    hx = torch.zeros(1, 5).float()

    total_loss = 0
    
    # Combine the Tensors into a TensorDataset object
    train_ds = torch.utils.data.TensorDataset(images, steerings)

    t = time.time()
    for index, (x, y) in enumerate(train_ds):

      # Model expects 4D input
      x = torch.unsqueeze(x, 0)

      # Predictions of the model.
      pred, hx = model(x, hx)
      
      # Computing the loss
      loss = loss_fn(pred, y)

      # Calculating total loss for displaying average loss for the clip. Serves 
      # no purpose other than evaluation.
      total_loss = total_loss + float(loss)

      # Displaying the current average loss
      print('\r', end = '')
      print("Loss ", total_loss / (index + 1), end= '')

    # Printing the time spent in forward pass
    print("\nForward ", time.time() - t)
    
    t = time.time()

    # Bacpropagation of gradients and optimizing the model
    loss.backward()
    opt.step()
    opt.zero_grad()

    # Printing the time spent in Backpropagation
    print("Backpropagation ", time.time() - t, end='\n-------------------------------\n')

Clip  1 Epoch  1
Reading  7.691854476928711
Loss  0.47126671554226623

  loss = loss_fn(pred, y)


Loss  0.23010490270738715
Forward  3.454789876937866
Backpropagation  9.75828766822815
-------------------------------
Clip  2 Epoch  1
Reading  6.373269081115723
Loss  0.07565224059421657
Forward  2.790131092071533
Backpropagation  7.263794422149658
-------------------------------
Clip  3 Epoch  1
Reading  3.766420602798462
Loss  0.01291669097233769
Forward  1.652897834777832
Backpropagation  4.224001407623291
-------------------------------
Clip  4 Epoch  1
Reading  3.5908706188201904
Loss  0.007328916617766921
Forward  1.440337896347046
Backpropagation  3.7534031867980957
-------------------------------
Clip  5 Epoch  1
Reading  16.265552043914795
Loss  0.023909478743391804
Forward  6.750047922134399
Backpropagation  17.15409207344055
-------------------------------
Clip  6 Epoch  1
Reading  5.292711496353149
Loss  0.039229327997739996
Forward  2.237657308578491
Backpropagation  5.564789295196533
-------------------------------
Clip  7 Epoch  1
Reading  4.154463052749634
Loss  0.017

# Saving the Model

We are saving the trained model as in the ONNX format.

In [3]:
# Dummy inputs 
x, hx = torch.randn(1, 3, 160, 320), torch.randn(1, 5)

# Testing if dummy inputs are correct 
pred, hidden = model(x, hx)

# Exporting the model. The dummy inputs are used to save the format of inputs 
# to the model.
torch.onnx.export(model, (x, hx), "Driver.onnx")