# Lab - YOLOv4-Tiny - Darknet Setup
## E6692 Spring 2022


In this Lab you will train YOLOv4-Tiny using the Darknet framework. Darknet is an open source neural network framework written in C and CUDA, which means it's efficient and outperforms TensorFlow/PyTorch in terms of training times for similar networks. However, it may feel slightly less intuitive to a Python programmer. This notebook will guide you through the setup process. You should execute these cells on the GCP instance.

The first step in setting up Darknet is to specify the build configuration. This is done by editing the Makefile. We need to specify `GPU=1`, `CUDNN=1`, `CUDNN_HALF=1`, and `OPENCV=1` to use the GPU for parallel operations, the cuDNN GPU acceleration libraries, half precision operations, and OpenCV for loading images. We also need to specify the GPU architecture by uncommenting the `ARCH` variable that corresponds to your instance's GPU. 

**TODO:** Open **darknet/Makefile** and make the changes specified above. 

## Install OpenCV

We need to install the C distribution of OpenCV (Open Source Computer Vision Library) for image processing. 

**TODO:** Open a terminal and enter the command `sudo apt-get install libopencv-dev` to install OpenCV.

## Build Darknet

To compile the darknet project code, we simply execute the command `make` in the darknet directory. The Makefile specifies the configuration details of the Darknet setup. All dependancies are all located within the directory. 

**TODO**: In the terminal, navigate to the `darknet` directory and enter `make`. You will see the compilation output and several warnings, which is OK. The darknet code should execute without errors. 

After you've compiled the Darknet code and completed the discussion questions below, you can start on the training notebook **darknet/DarknetTraining.ipynb**.

## Discussion Questions

In the previous lab you implemented the forward pass of a basic CNN in CUDA. The Darknet framework works in the same way - CNN layers are defined in CUDA such that they can be calculated on GPUs - but it also has a lot more functionality than what you implemented. For instance, there are libraries of different activation functions, layer types, and loss functions all implemented in CUDA and C that can be combined to generate complex model architectures. To get a better sense of how Darknet is combining CPU and GPU functionality to complete the computations necessary for Deep Learning, take a look at **darknet/src**. This directory contains the Darknet source code for individual model operations.

Open **darknet/src/activation_kernels.cu**, **darknet/src/activation_kernels.c**, and **darknet/src/activation_kernels.h** and look through the functions in these files. Describe the purpose of these functions and their designation to **.cu**, **.c**, or **.h**. How do these C and CUDA functions work to produce activation functions? How do they calculate the gradients of these activation functions?

**TODO:** Your answer here.

Functions are defined in the .h file and implemented in the .c file. GPU implementations are in .cu file and those implementations are used in functions in .c file.

Open **darknet/src/dark_cuda.c**. Describe the purpose of the following functions: **cuda_free()**, **cuda_free_host()**, **cuda_push_array()**, **cuda_pull_array()**, **cuda_pull_array_async()**, and **get_number_of_blocks()**

**TODO:** Your answer here.

1. cuda_free()

Free the memory of GPU amd check the status of it.

2. cuda_free_host()

Frees page-locked memory.

3. cuda_push_array()

Copy the array from host to device.

4. cuda_pull_array()

Copy the array from device to host.

5. cuda_pull_array_async()

Direction of the transfer is inferred from the pointer values. Requires unified virtual addressing. 

6. get_number_of_blocks()

The number of blocks on devices required for the array.

reference:

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g18fa99055ee694244a270e4d5101e95b

What is the purpose of the "CHECK_CUDA(status);" lines in these functions?

**TODO:** Your answer here.

To check if the operation is successful. 

Compare your implementation of MaxPool2D, and conv2D in the previous lab to the corresponding Darknet implementations: **src/darknet/convolutional_kernels.forward_convolutional_layer_gpu()** and **src/darknet/maxpool_layer_kernels.forward_maxpool_layer_kernel()**. Briefly explain how the backward pass functions work (backward_convolutional_layer_gpu() and backward_maxpool_layer_kernel()).

**TODO:** Your answer here.

1. Convolutional_layer_gpu()

They used a technique called im2col and also General Matrix Multiply (GEMM) to simplify and accelerate the convolutional operation. Also, shared memory were used in the function to reduce the number of fetch data from global memory.

2. forward_maxpool_layer_kernel()

This function is not too much different from my implementation. This is probably because maxpooling is a simple operation that data fetching can't be avoided.

3. backward_convolutional_layer_gpu()

Backward probagation of convolutional layer is also a convolutional opertion, so it is similar to the forward implementation.

4. backward_maxpool_layer_kernel()

Backward of maxpooling is also similar to the forward implementation. 