
KEY: 
- <><><><> == I WRITE
- ???????? == STUDENT WORK


In [3]:
from IPython.display import HTML, SVG, IFrame

In [None]:
#include <stdio.h>
#include "/darknet/include/darknet.h"

void main(){printf("hello");}

# Convolving a Neural Network
---

## Real-time object detection and classification with YOLO

**What you'll learn:** 
- How a Convolutional Neural Network is implemented in a real world application
- How to build convolutional and (max)pooling layers in C
- Run YOLO

## Before we get started...

Open up a Terminal using Jupyter. Once you've done that, copy-paste this command: 
```
curl https://pjreddie.com/media/files/yolo.weights -o /notebooks/sp18/data/darknet/yolo.weights --create-dirs
```

This will download a pre-trained YOLO into your darknet directory. By the time we run it, the weights should be finished downloading.

# 1. Learn what you're working with

Darknet is an elegant neural network framework that can build anything from traditional to recurrent to convolutional to long short-term memory (LSTM) neural networks. For this workshop, we will be focusing on running YOLO by rebuilding functions for the convolutional and pooling layers.


Before we begin, we have to understand the Darknet's architecture. Otherwise, we're just grasping at straws! Below is the abstract structure for a layer. I removed variables that are not in the scope of the workshop to avoid confusion.

In [None]:
#include "/darknet/include/darknet.h"
struct layer_ex{
    LAYER_TYPE type;
    ACTIVATION activation;
    COST_TYPE cost_type;
    void (*forward)   (struct layer, struct network); //Forward propagation
    void (*backward)  (struct layer, struct network); //Backward propagation
    void (*update)    (struct layer, update_args);    //updater
    void (*forward_gpu)   (struct layer, struct network);
    void (*backward_gpu)  (struct layer, struct network);
    void (*update_gpu)    (struct layer, update_args);
    
    int batch_normalize; //boolean -> use batch normalization == 1. There were other normalization methods but we use this
    int batch; // # of samples to be fed in
    
                //vvvv We manipulate these as matrices, but physically, they are 1D arrays w/ calloc
    int inputs;    //data fed into the layer
    int outputs;   //data pushed out of the layer
    int nweights;  // # of weights 
                //^^^^
    int nbiases;    //Additional weights to add

    int h,w,c;  //height h and width w of each input 'matrix', and a count c of all the matrices fed in
    int out_h, out_w, out_c;  //height h and width w of each output 'matrix', and a count c of all the matrices returned
    int n;
    
    int groups;     //An AlexNet adaption to the convolutional layer. Partitions the kernel (filters)
    int size;       //rank of a given input matrix ()
    int side;
    int stride; //How many columns over the kernel (filter matrix) moves

    float alpha;
    float beta;
    float kappa;

    float coord_scale;
    float object_scale;
    float noobject_scale;
    float mask_scale;
    float class_scale;
    int bias_match;
    int random;
    float thresh;
    int classfix;
    int absolute;

    int onlyforward;
    int stopbackward;
    int dontload;
    int dontloadscales;

    float temperature;
    float probability;
    float scale;

    char  * cweights;
    int   * indexes;
    int   * input_layers;
    int   * input_sizes;
    int   * map;
    float * rand;
    float * cost;
    float * state;
    float * prev_state;
    float * forgot_state;
    float * forgot_delta;
    float * state_delta;
    float * combine_cpu;
    float * combine_delta_cpu;

    float * concat;
    float * concat_delta;

    float * binary_weights;

    float * biases;
    float * bias_updates;

    float * scales;
    float * scale_updates;

    float * weights;
    float * weight_updates;

    float * delta;
    float * output;
    float * squared;
    float * norms;

    float * spatial_mean;
    float * mean;
    float * variance;

    float * mean_delta;
    float * variance_delta;

    float * rolling_mean;
    float * rolling_variance;

    float * x;
    float * x_norm;

    float * m;
    float * v;
    
    float * bias_m;
    float * bias_v;
    float * scale_m;
    float * scale_v;


    float *z_cpu;
    float *r_cpu;
    float *h_cpu;
    float * prev_state_cpu;

    float *temp_cpu;
    float *temp2_cpu;
    float *temp3_cpu;

    float *dh_cpu;
    float *hh_cpu;
    float *prev_cell_cpu;
    float *cell_cpu;
    float *f_cpu;
    float *i_cpu;
    float *g_cpu;
    float *o_cpu;
    float *c_cpu;
    float *dc_cpu; 

    float * binary_input;

    tree *softmax_tree;

    size_t workspace_size;
    
    /*CUT GPU STUFF FOR EXAMPLES...*/
};

# 2. Breaking down the ConvNet pipeline

### Overview

A ConvNet is a series of Convolutional and pooling layers appended to a (usually) fully connected neural network (or ANN.) These layers are needed in order to maintain spatial relations between pixels in an image, something that ANNs are unable to do. Our last workshop went over how to program an ANN from scratch. We now want to show how we can add these additional layers to our neural network.

<img src="assets/images/CNN_pipeline.png"/>

### Convolutional Layer

Recall: For each convolutional layer, we must denote the dimensions of the input. Our inputs will strictly be images, so we have width **w** and height **h** of the input matrix. Since we read the data in as one big array, we partition the array into equal sizes. **c** is the count, as in, how many different matrices we want to be reading in.

#### Convolution visualization

The filter (kernel) performs element-wise matrix multiplication in every position that the filter fully "fits" within the input matrix. The filter moves column by column and row by row w.r.t its **stride** (e.g. stride of 2 will move the filter two rows/columns at a time.)

In [4]:
IFrame("assets/cnns/convolution.html", "1000", "500")

In Darknet, our filters have an additional property: **groups**

#### Traditional Convolutional Layers vs. Group Convolutional Layers

So a traditional convolution looks something like this. Given a set of samples (left), we apply our filters (middle) to create a set of feature maps.

<img src="assets/cnns/convlayer_traditional.svg"/>

With **groups**, we essentially partition the filters, which allows the convolutional layer to categorize filters.

<img src="assets/cnns/convlayer_group.svg"/>

With this into consideration, we will develop a function to create a convolutional layer.

In [None]:
#include "/darknet/include/darknet.h"
#include "/darknet/src/convolutional_layer.h"

convolutional_layer make_convolutional_layer_ex(int batch, int h, int w, int c, int n, int groups, int size, int stride, int padding, ACTIVATION activation, int batch_normalize, int binary, int xnor, int adam)
{
    /* WRITE
    
    INSTANTIATE variables
    */
    int i;
    convolutional_layer l = {0};
    l.type = CONVOLUTIONAL;
    

    l.groups = groups;   //Filter groups. This parameter tells this layer to partition it's weights by this amount
    l.h = h;
    l.w = w;
    l.c = c;
    l.n = n;         //prospected number of outputted matrices
    l.batch = batch;     //# of samples fed in per run through
    l.stride = stride;
    l.size = size;
    l.pad = padding;
    l.batch_normalize = batch_normalize;

    l.weights = calloc(c/groups*n*size*size, sizeof(float));
    l.weight_updates = calloc(c/groups*n*size*size, sizeof(float));

    l.biases = calloc(n, sizeof(float));
    l.bias_updates = calloc(n, sizeof(float));

    l.nweights = c/groups*n*size*size;
    l.nbiases = n;
//////////////////////////////////////////////////////////////////////////
    
    
    /* EXPLAIN
    INSTANTIATE WEIGHTS
    
    
    scale = norm of all the input matrices (rank_1^2 + rank_2^2 + ... + rank_(c)^2)^1/2
    rand_normal() = Muller transform
    size  = rank of each input matrix within the data, but it's set as # rows of matrix
    c     = # of matrices
    
    */

    float scale = sqrt(2./(size*size*c/l.groups));
    for(i = 0; i < l.nweights; ++i) l.weights[i] = scale*rand_normal();
    
    //Calculate the dimensions of the output matrix with this ~~~MAGIC~~~ function! 
    int out_w = convolutional_out_width(l);
    int out_h = convolutional_out_height(l);
///////////////////////////////////////////
    
    /* WRITE
    
    instantiate the data that will be outputted.
    So there will be height and width of each output matrix
    and the total count of them. Denoted as n now
    
    */
    
    
    
    l.out_h = out_h;
    l.out_w = out_w;
    l.out_c = n;
    l.outputs = l.out_h * l.out_w * l.out_c;
    l.inputs = l.w * l.h * l.c;

    l.output = calloc(l.batch*l.outputs, sizeof(float));
    l.delta  = calloc(l.batch*l.outputs, sizeof(float));

    l.forward = forward_convolutional_layer;
    l.backward = backward_convolutional_layer;
    l.update = update_convolutional_layer;
    
//////////////////////////////////////////////////
    /*EXPLAIN*/
    //This is for batch normalization
    //Batch Normalization is a method to reduce internal covariate shift in neural networks

    l.scales = calloc(n, sizeof(float));
    l.scale_updates = calloc(n, sizeof(float));
        for(i = 0; i < n; ++i){
            l.scales[i] = 1;
        }

    l.mean = calloc(n, sizeof(float));
    l.variance = calloc(n, sizeof(float));

    l.mean_delta = calloc(n, sizeof(float));
    l.variance_delta = calloc(n, sizeof(float));

    l.rolling_mean = calloc(n, sizeof(float));
    l.rolling_variance = calloc(n, sizeof(float));
    l.x = calloc(l.batch*l.outputs, sizeof(float));
    l.x_norm = calloc(l.batch*l.outputs, sizeof(float));
   
////////////////////////////////////////////////////////
    NOT GOING OVER GPU SET UP
///////////////////////////////////////////////////////
    /*WRITE
    
    Now all we have to do is set the workspace of this layer with our magic function get_workspace_size(l)
    and the activation layer we wish to give it. Almost always will be either LEAKY or RELU
    */
    l.workspace_size = get_workspace_size(l);
    l.activation = activation;

    fprintf(stderr, "conv  %5d %2d x%2d /%2d  %4d x%4d x%4d   ->  %4d x%4d x%4d\n", n, size, size, stride, w, h, c, l.out_w, l.out_h, l.out_c);

    return l;
}





### Pooling Layer

Now that we have set up the function to create convolutional layers, we must do the same for the pooling layers.

**Recall:** Pooling layers compress the data outputted from the convolutional layer. 

**How:** Input is sectioned into small pieces. One element is chosen from each section and is fed into a smaller, output matrix.

In this workshop, we will focus on **maxpooling**, which chooses the element with the highest value in each section

#### Maxpooling visualization

In [None]:
IFrame("assets/cnns/maxpool.html", 600, 350)

Cool. So now to create the maxpool layer:

In [None]:
#include "/darknet/include/darknet.h"
#include "/darknet/src/maxpool_layer.h"

maxpool_layer make_maxpool_layer_ex(int batch, int h, int w, int c, int size, int stride, int padding)
{
    /*WRITE
    
    Just like in make_convolutional_layer, we have to build our maxpool layer.
    */
    //<><><><><><><><><><><><><>
    maxpool_layer l = {0};
    l.type = MAXPOOL;
    l.batch = batch;
    l.h = h;
    l.w = w;
    l.c = c;
    l.inputs = h*w*c;
    l.pad = padding;
    
    
    l.out_w = (w + 2*padding)/stride;
    l.out_h = (h + 2*padding)/stride;
    l.out_c = c;
    l.outputs = l.out_h * l.out_w * l.out_c;
    int output_size = l.out_h * l.out_w * l.out_c * batch;
    
    l.size = size;
    l.stride = stride;
    
    l.indexes = calloc(output_size, sizeof(int));
    l.output =  calloc(output_size, sizeof(float));
    l.delta =   calloc(output_size, sizeof(float));
    l.forward = forward_maxpool_layer;
    l.backward = backward_maxpool_layer;
    //<><><><><><><><><><><><><><><><><><><><><><><>
    #endif
    fprintf(stderr, "max          %d x %d / %d  %4d x%4d x%4d   ->  %4d x%4d x%4d\n", size, size, stride, w, h, c, l.out_w, l.out_h, l.out_c);
    return l;
}

# 3. Build!

Ok, now that we have explained the set up of the convolutional and pooling layers, give a shot at implementing the forward / backward passes of each!

### Convolutional

Here are the functions integral to training our ConvNet:
- forward_convolutional_layer() = Applies convolution onto a given convolutional layer

- backward_convolutional_layer() = Applies backpropagation in a given convolutional layer, gets the rate of change of weights


**things to keep in mind:**
 - net.workspace => allocated space for the network data. i.e. TELLS YOU WHICH LAYER YOU ARE CURRENTLY ON and ITS DATA
 - l.batch => # of samples fed into the convolutional layer per run through
 - gemm() = General Matrix Multiplication

In [None]:
#include "/darknet/include/darknet.h"
#include "/darknet/src/convolutional_layer.h"

void main()
{
    convolutional_layer l = make_convolutional_layer(1, 5, 5, 3, 2, 1, 5, 2, 1, RELU, 1, 0, 0, 0);
    l.batch_normalize = 1;
    float data[] = {1,1,1,1,1,
        1,1,1,1,1,
        1,1,1,1,1,
        1,1,1,1,1,
        1,1,1,1,1,
        2,2,2,2,2,
        2,2,2,2,2,
        2,2,2,2,2,
        2,2,2,2,2,
        2,2,2,2,2,
        3,3,3,3,3,
        3,3,3,3,3,
        3,3,3,3,3,
        3,3,3,3,3,
        3,3,3,3,3};
    net.input = data;
    forward_convolutional_layer_student(l);
    backward_convolutional_layer_student(l);
}

void forward_convolutional_layer_student(convolutional_layer l, network net)
{
    //<><><><><><><><><><><><<><><><><><><><><><><>
    int i, j;

    fill_cpu(l.outputs*l.batch, 0, l.output, 1);

    if(l.xnor){
        binarize_weights(l.weights, l.n, l.c/l.groups*l.size*l.size, l.binary_weights);
        swap_binary(&l);
        binarize_cpu(net.input, l.c*l.h*l.w*l.batch, l.binary_input);
        net.input = l.binary_input;
    }
    //<><><><><><><><><><><><><><><><><><><><><><><><><>
/////////////////////////////////////////////////////////////////////////////////////////
    
    //<><><><><><><><><><><><><><><>
    int m = l.n/l.groups;               ///
    int k = l.size*l.size*l.c/l.groups; ///
    int n = l.out_w*l.out_h;            ///output area size
    //<><><><><><><><><><><><><><><><>

    //<><><><><><><><><><><><><><><><> except: i < ?????, j < ?????
    for(i = 0; i < l.batch; ++i){
        for(j = 0; j < l.groups; ++j){   //in case of AlexNet implementation, we must account for the groups 
    //<><><<><><><><><><><><><><><><><><>
            //configure the memory placement of the pointers
            float *a = l.weights + j*l.nweights/l.groups;   // = ???  + l.???*j/l.?????          kernel
            float *b = net.workspace;                       // = ???                             current layer
            float *c = l.output + (i*l.groups + j)*n*m;     // = ??? + (i*l.groups + j)*m*?      outputs

            im2col_cpu(net.input + (i*l.groups + j)*l.c/l.groups*l.h*l.w,  
                l.c/l.groups, l.h, l.w, l.size, l.stride, l.pad, b);   /////GIVEN
            gemm(0,0,m,n,k,1,a,k,b,n,1,c,n); // General Matrix multiplication ( TA, TB, M, N, K, ALPHA,A,lda, B, ldb,BETA,C,ldc);
            //CAll this to test GEMM to show ppl void time_random_matrix(int TA, int TB, int m, int k, int n)
        
        }
    }
///////////////////////////////////////////////////////////////////////////////////////////////////////////
    if(l.batch_normalize){
        forward_batchnorm_layer(l, net);
    } else {
        add_bias(l.output, l.biases, l.batch, l.n, l.out_h*l.out_w);
    }

    activate_array(l.output, l.outputs*l.batch, l.activation);
    if(l.binary || l.xnor) swap_binary(&l);
}

//Backpropagation on a convolutional layer
void backward_convolutional_layer_student(convolutional_layer l, network net)
{
    
    //<><><><><><><><><><>
    int i, j;
    int m = l.n/l.groups;
    int n = l.size*l.size*l.c/l.groups;
    int k = l.out_w*l.out_h;
    //<><><><><><>
    
    gradient_array(l.output, l.outputs*l.batch, l.activation, l.delta); //gets rate of change 

    if(l.batch_normalize){
        backward_batchnorm_layer(l, net);
    } else {
        backward_bias(l.bias_updates, l.delta, l.batch, l.n, k);
    }
    
    
////////////////////////////////////////////////////////////////////////////////////////////////
    //<><><><><><><><><><><>
    for(i = 0; i < l.batch; ++i){
        for(j = 0; j < l.groups; ++j){
    //<><<><>><><><><><><><>
            float *a = l.delta + (i*l.groups + j)*m*k;   //updates each image in batch, group for group.
            float *b = net.workspace;
            float *c = l.weight_updates + j*l.nweights/l.groups;

            float *im = net.input+(i*l.groups + j)*l.c/l.groups*l.h*l.w;
                
            //GIVEN
            im2col_cpu(im, l.c/l.groups, l.h, l.w, 
                    l.size, l.stride, l.pad, b);
            
            gemm(0,1,m,n,k,1,a,k,b,k,1,c,n);  //gemm(0,1,?,?,?,1,?,?,?,?,1,?,?)

            if(net.delta){
                a = l.weights + j*l.nweights/l.groups;
                b = l.delta + (i*l.groups + j)*m*k;
                c = net.workspace;

                gemm(1,0,n,k,m,1,a,n,b,k,0,c,k); //apply the general matrix mult

                col2im_cpu(net.workspace, l.c/l.groups, l.h, l.w, l.size, l.stride, 
                    l.pad, net.delta + (i*l.groups + j)*l.c/l.groups*l.h*l.w);
            }
        }
    }
//////////////////////////////////////////////////////////////////////////////////////////
}

### Maxpooling

Don't worry, maxpooling is much much simpler to create.

In [None]:
#include "/darknet/include/darknet.h"
#include "/darknet/src/maxpool_layer.h"

void main()
{
    //make_maxpool_layer(int batch, int h, int w, int c, int size, int stride, int padding)
    maxpool_layer l = make_maxpool_layer(1, 5, 5, 3, 5, 1, 1);
    float data[] = {1,1,1,1,1,
        1,1,1,1,1,
        1,1,1,1,1,
        1,1,1,1,1,
        1,1,1,1,1,
        2,2,2,2,2,
        2,2,2,2,2,
        2,2,2,2,2,
        2,2,2,2,2,
        2,2,2,2,2,
        3,3,3,3,3,
        3,3,3,3,3,
        3,3,3,3,3,
        3,3,3,3,3,
        3,3,3,3,3};
    net.input = data;
    forward_maxpool_layer_student(l, net);
    backward_maxpool_layer_student(l, net);
}


void forward_maxpool_layer_student(const maxpool_layer l, network net)
{
    int b,i,j,k,m,n;
    int w_offset = -l.pad;
    int h_offset = -l.pad;

    int h = l.out_h;
    int w = l.out_w;
    int c = l.c;
////////////////////////////////////////////////////////////
    
    //<><><><><><<><><>
    for(b = 0; b < l.batch; ++b){
        for(k = 0; k < c; ++k){
            for(i = 0; i < h; ++i){
                for(j = 0; j < w; ++j){
    //<><><><><><><><><>><><><<><>
                    int out_index = j + w*(i + h*(k + c*b));
                    
                    //?????????????????????????/
                    float max = -FLT_MAX;
                    int max_i = -1;
                    //??????????????????????????
                    for(n = 0; n < l.size; ++n){
                        for(m = 0; m < l.size; ++m){
                            int cur_h = h_offset + i*l.stride + n;
                            int cur_w = w_offset + j*l.stride + m;
                            int index = cur_w + l.w*(cur_h + l.h*(k + b*l.c));
                            int valid = (cur_h >= 0 && cur_h < l.h &&
                                         cur_w >= 0 && cur_w < l.w);
                            float val = (valid != 0) ? net.input[index] : -FLT_MAX;
                            max_i = (val > max) ? index : max_i;
                            max   = (val > max) ? val   : max;
                        }
                    }
                    //
                    
                    l.output[out_index] = max;
                    l.indexes[out_index] = max_i;
                }
            }
        }
    }
////////////////////////////////////////////////////////////////
}

void backward_maxpool_layer(const maxpool_layer l, network net)
{
    int i;
    int h = l.out_h;
    int w = l.out_w;
    int c = l.c;
    for(i = 0; i < h*w*c*l.batch; ++i){ // i < ?????????
        int index = l.indexes[i];
        net.delta[index] += l.delta[i]; //??????????????????????/
    }
}

# 4. Run!

Now that we have finished writing our functions, let's plug them into the source code!

#### Append code to source files

1. In your folder viewer, go to [PATH TO THIS REPO]/meetings/darknet/src/
2. Copy the inner code of each function and place them in the corresponding C file and function name
    - E.g. In **convolutional_layer.c:** forward_convolutional_layer_student() **->** forward_convolutional_layer()

#### Make Darknet

1. Open up your terminal (or Command Prompt) and **cd** to **[PATH TO THIS REPO]/meetings/darknet/**

2. **make**

#### Run YOLO

In [None]:
//Paste this into your terminal/command prompt
./darknet detect cfg/yolo.cfg yolo.weights data/dog.jpg


//You can mess around with different pictures and the threshold level:
./darknet detect cfg/yolo.cfg yolo.weights data/dog.jpg -thresh 0

//And detect mutiple objects in an image:
./darknet detect cfg/yolo.cfg yolo.weights data/horses.jpg


//You can even detect and classify objects in real-time!
//Follow the guide here to install cuda and OpenCV: https://pjreddie.com/darknet/install/#cuda
// ./darknet detector demo cfg/coco.data cfg/yolo.cfg yolo.weights

# yay u did it