# NN from Scratch

This project came to be as the final assignment in a course I took in university. At the time the plan was to port the code from Python to R, which was a cumbersome challenge in itself. Since then I've been trying to learn Julia and what better way to learn than trying to port something I am very familiar with at this point? I will keep the same introduction as I used in the other two notebooks in case anyone ever lands on this notebook first. 

As a conclusion to the STAN48 course I decided to create simple implementation of a feed forward neural network using mainly Numpy and base Python. While there are already exquisite packages that offer these solutions (like Tensorflow and Pytorch), a step by step implementation of a neural network is still valuable for teaching basic programming concepts as well as basic neural network concepts. The code and explanations presented here are inspired heavily from two sources, namely Andrew Ng's course on [Neural Networks and Deep Learning](https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning), and LUSEM's [Deep Learning and AI Methods](https://www.stat.lu.se/utbildning/kurser/stan47_deep_learning_and_artificial_intelligence_methods) course. The code is then an adaptation of the teachings found in both courses. Additionally, in this project I offer an R version for the project presented here, which can be found in the `R-NN_from_scratch.ipynb` file. As a final disclaimer I must admit that adapting the code in Python was not a hard task, but porting it to R was a strenuous nightmare-like task since the data types can be treated quite differently in both R and Python.

As a general example I will use the [Kaggle Dogs vs. Cats](https://www.microsoft.com/en-us/download/details.aspx?id=54765) dataset to classify whether a given picture shows a cat or not. As the data set only includes two different options, we can assume the `not cat` option to be the same as `dog`. As mentioned, the intent is to have a general example to expose how the algorithm works, and the intricasies of the programming challenge, in other words, it is *not* my intention to implement a functioning neural network from scratch **and** a good model for classifying cats. 

## R or Python? ... Or Julia?

This notebook is originally made for Python. One of the requirements for this project was that whatever the choice of application to be developed, it should be done in both Python **and** R. Well, that I did, now I'm doing it for Julia too cause why not. You can check the notebooks for Python an R in this same repository. 

### The structure

This project includes several files. In this notebook you will find the application related functions, however, many of the base functions used for the calculations are left in a separate file that concentrates all the basic calculation functions. Without those dependencies this notebook will not function as it should. Some basic concepts regarding neural networks will be presented through the notebook, but the focus of this work is exposing the programming challenge behind neural networks.

### The data

The data set contains 25000 images of dogs and cats, but 59 of them were corrupted or in grayscale and, therefore, dropped. The classes are balanced and the angle, depth, light, and dimensions are not uniform. While originally a Kaggle competition data set, I opted to use the version made available by Microsoft because it did not pre divide the data giving me more freedom to split the sets as I please.

In [1]:
function relu(Z)
  A = max.(Z, 0)
  cache = Z
  @assert size(A) == size(Z) "Sizes don't match in relu function"
  return A, cache
end

function sigmoid(Z)
  A = 1 ./ (1 .+ exp.(Z))
  cache = Z
  return A, cache
end

function relu_backprop(dA, cache)
  Z = cache
  dZ = dA
  dZ[Z .<= 0] .= 0
  @assert size(dZ) == size(Z) "Sizes don't match in relu_backprop function"
  return dZ
end

function sigmoid_backprop(dA, cache)
  Z = cache
  s = 1 ./ (1 .+ exp.(-Z))
  dZ = dA .* s .* (1 .- s)
  @assert size(dZ) == size(Z) "Sizes don't match in sigmoid_backprop function"
  return dZ
end

sigmoid_backprop (generic function with 1 method)

In [2]:
write("helper_functions.jl", In[IJulia.n - 1])

581

In [3]:
function init_param(layer_dim)
  L = length(layer_dim)
  parameters = Dict()
  for l in 1:(L - 1)
    parameters[string("W", l)] = rand(layer_dim[l], layer_dim[l + 1]) * 0.01
    parameters[string("b", l)] = zeros(layer_dim[l + 1], 1)
    @assert size(parameters[string("W", l)]) == (layer_dim[l], layer_dim[l + 1]) "Weights size wrong in init_param function"
    @assert size(parameters[string("b", l)]) == (layer_dim[l + 1], 1) "Bias size wrong in init_param function"
  end
  return parameters
end

init_param (generic function with 1 method)

In [4]:
write("init_param.jl", In[IJulia.n - 1])

500

In [5]:
include("helper_functions.jl")

function for_prop(A, W, b)
  Z = (W' * A) .+ b
  @assert size(Z) == (size(W')[1], size(A)[2])
  cache = (A, W, b)
  return Z, cache
end

function for_activation(A_prev, W, b, activ)
  if activ == "sigmoid"
    Z, linear_cache = for_prop(A_prev, W, b)
    A, activ_cache = sigmoid(Z)
  elseif activ == "relu"
    Z, linear_cache = for_prop(A_prev, W, b)
    A, activ_cache = relu(Z)
  end
  @assert size(A) == (size(W')[1], size(A_prev)[2]) "Activation size wrong in for_activation function"
  cache = (linear_cache, activ_cache)
  return A, cache
end

for_activation (generic function with 1 method)

In [6]:
write("for_prop.jl", In[IJulia.n - 1])

582

In [7]:
include("for_prop.jl")

function deep_model(X, parameters)
  caches = []
  A = X
  L = div(length(parameters), 2)
  for l in 1:(L-1)
    A_prev = A
    A, cache = for_activation(A_prev,
                              parameters[string("W", l)],
                              parameters[string("b", l)],
                              "relu")
    push!(caches, cache)
  end
  AV, cache = for_activation(A,
                             parameters[string("W", L)],
                             parameters[string("b", L)],
                             "sigmoid")
  push!(caches, cache)
  @assert size(AV) == (1, size(X)[2]) "AV size wrong in deep_model function"
  return AV, caches
end

deep_model (generic function with 1 method)

In [8]:
write("deep_model.jl", In[IJulia.n - 1])

680

In [9]:
function cost_computation(AV, Y)
  m = length(Y)
  Y = reshape(Y, (length(Y), 1))
  cost = -(1/m) * sum(log.(AV) * Y .+ log.((1 .- AV)) * (1 .- Y))
  return cost
end

cost_computation (generic function with 1 method)

In [10]:
write("cost_computation.jl", In[IJulia.n - 1])

165

In [11]:
include("helper_functions.jl")

function back_prop(dZ, cache)
  A_prev, W, b = cache
  m = size(A_prev)[2]
  dW = (1/m) * dZ * A_prev'
  db = (1/m) * sum(dZ, dims=2)
  dA_prev = W * dZ
  @assert size(dA_prev) == size(A_prev) "dA_prev size wrong in back_prop function"
  @assert size(dW) == size(W') "dW size wrong in back_prop function"
  @assert size(db) == size(b) "db size wrong in back_prop function"
  return dA_prev, dW, db
end

function back_activ(dA, cache, activ)
  for_cache, activ_cache = cache
  if activ == "relu"
    dZ = relu_backprop(dA, activ_cache)
    dA_prev, dW, db = back_prop(dZ, for_cache)
  elseif activ == "sigmoid"
    dZ = sigmoid_backprop(dA, activ_cache)
    dA_prev, dW, db = back_prop(dZ, for_cache)
  end
  return dA_prev, dW, db
end

function deep_model_back(AV, Y, caches)
  grads = Dict()
  L = length(caches)
  m = size(AV)[2]
  Y = reshape(Y, 1, length(AV))
  dAL = -(Y ./ AV) - ((1 .- Y) ./ (1 .- AV))
  present_cache = caches[L]
  grads[string("dA", L)], grads[string("dW", L)], grads[string("db", L)] = back_activ(dAL, present_cache, "sigmoid")
  for l in (L - 1):-1:1
    present_cache = caches[l]
    dA_prev_temp, dW_temp, db_temp = back_activ(grads[string("dA", l+1)], present_cache, "relu")
    grads[string("dA", l)] = dA_prev_temp
    grads[string("dW", l)] = dW_temp
    grads[string("db", l)] = db_temp
  end
  return grads
end

deep_model_back (generic function with 1 method)

In [12]:
write("back_prop.jl", In[IJulia.n - 1])

1377

In [13]:
function update(params, grads, learning_rate)
  parameters = copy(params)
  L = div(length(parameters), 2)
  for l in 1:L
    parameters[string("W", l)] = parameters[string("W", l)] - learning_rate * grads[string("dW", l)]'
    parameters[string("b", l)] = parameters[string("b", l)] - learning_rate * grads[string("db", l)]
  end
  return parameters
end

update (generic function with 1 method)

In [14]:
write("update.jl", In[IJulia.n - 1])

354

In [15]:
using Images, FileIO, InvertedIndices, Suppressor

function process_image(path_vec::Vector{String}, h::Int64, w::Int64, label::Int64)
  result = zeros((h*w*3), length(path_vec))
  class = Int[]
  @suppress begin
    for i in enumerate(path_vec) 
      try
        img = load(i[2])
      catch 
        continue
      end
      img = (img === nothing) ? continue : img
      img = imresize(img,(h,w))
      img = size(img) == (h, w) ? img : continue
      img = vec(img)
      try
        img = [temp(img[i]) for i = 1:length(img), temp in (red, green, blue)]
      catch
        continue
      end
      img = reshape(img, ((h*w*3),1))
      result[:,i[1]] = img
      push!(class, label)
    end
  end
  return result, class
end

function create_dataset(filenames_cat::Vector{String}, filenames_dog::Vector{String}, height::Int64, width::Int64, labels)
  cat_i, cat_l = process_image(filenames_cat, height, width, labels[1])
  dog_i, dog_l = process_image(filenames_dog, height, width, labels[2])
  imgs = hcat(cat_i, dog_i)
  class = vcat(cat_l, dog_l)
  i=1
  while i <= size(imgs)[2]
    imgs = sum(imgs[:,i]) == 0 ? imgs[:, Not(i)] : imgs
    i +=1
  end
  return imgs, class
end

create_dataset (generic function with 1 method)

In [16]:
write("create_dataset.jl", In[IJulia.n - 1])

1184

In [34]:
include("update.jl")
include("back_prop.jl")
include("deep_model.jl")
include("init_param.jl")
include("cost_computation.jl")
function dense_nn(X, Y, layers_dims, learning_rate, num_iterations, print_cost)
  costs = Float64[]
  parameters = init_param(layers_dims)
  for i in 1:num_iterations
    AV, caches = deep_model(X, parameters)
    cost = cost_computation(AV, Y)
    grads = deep_model_back(AV, Y, caches)
    parameters = update(parameters, grads, learning_rate)
    if print_cost && i%100==1 || i==num_iterations-1
      println(string("Cost after iteration ",i,": ",cost))
    end
    if i%100==1 || i==num_iterations
      push!(costs, cost)
    end
  end
  return parameters, costs
end

dense_nn (generic function with 1 method)

In [35]:
write("dense_nn.jl", In[IJulia.n - 1])

698

In [48]:
include("deep_model.jl")
function predict(X, y, parameters)
  m = size(X)[2]
  n = div(length(parameters), 2)
  p = Int[]
  probs, caches = deep_model(X, parameters)
  @show size(probs)
  for i in 1:size(probs)[2]
    if probs[i] > 0.5
      push!(p, 1)
    else
      push!(p, 0)
    end
  end
  print(string("Accuracy ", sum((p==y)/m)))
  return p
end

predict (generic function with 1 method)

In [49]:
write("predict.jl", In[IJulia.n - 1])

353

In [21]:
include("create_dataset.jl")

cat_path = "C:/Users/wtrindad/source/repos/NN_from_scratch/PetImages/Cat/"
cat_imgs = joinpath.(cat_path, readdir(cat_path))
dog_path = "C:/Users/wtrindad/source/repos/NN_from_scratch/PetImages/Dog/"
dog_imgs = joinpath.(dog_path, readdir(dog_path))

img_data, img_label = create_dataset(cat_imgs, dog_imgs, 32, 32, (1,0));

Corrupt JPEG data: 239 extraneous bytes before marker 0xd9
Corrupt JPEG data: 214 extraneous bytes before marker 0xd9
Corrupt JPEG data: 128 extraneous bytes before marker 0xd9
Corrupt JPEG data: 99 extraneous bytes before marker 0xd9
Corrupt JPEG data: 1153 extraneous bytes before marker 0xd9
Corrupt JPEG data: 396 extraneous bytes before marker 0xd9
Corrupt JPEG data: 228 extraneous bytes before marker 0xd9
Corrupt JPEG data: 162 extraneous bytes before marker 0xd9
Corrupt JPEG data: 1403 extraneous bytes before marker 0xd9
Corrupt JPEG data: 252 extraneous bytes before marker 0xd9
Corrupt JPEG data: 2226 extraneous bytes before marker 0xd9
Corrupt JPEG data: 65 extraneous bytes before marker 0xd9


In [22]:
using StatsBase
samples = wsample([1, 0], Weights([0.85, 0.15]), size(img_data)[2], replace=true)

x_train = img_data[:, samples .== 1]
y_train = img_label[samples .== 1]
x_test = img_data[:, samples .== 0]
y_test = img_label[samples .== 0];



In [23]:
@show size(x_train)
@show length(y_train)
@show size(x_test)
@show length(y_test);

size(x_train) = (3072, 21242)
length(y_train) = 21242
size(x_test) = (3072, 3747)
length(y_test) = 3747


In [24]:
layer_dims = [size(x_train)[1],20,7,5,1]

5-element Vector{Int64}:
 3072
   20
    7
    5
    1

In [55]:
include("dense_nn.jl")
parameters, costs = dense_nn(x_train, y_train, layer_dims, 0.01, 1, true)
@show costs;

Cost after iteration 1: 0.6931462031054509
costs = [0.6931462031054509]


In [62]:
parameters, costs = dense_nn(x_train, y_train, layer_dims, 0.15, 150, true);

Cost after iteration 1: 0.6931448241562331
Cost after iteration 101: 67.35089888447875
Cost after iteration 149: 68.64504394849563


In [63]:
include("predict.jl")
predict_train = predict(x_train, y_train, parameters);

size(probs) = (1, 21242)
Accuracy 0.0

In [64]:
predict_test = predict(x_test, y_test, parameters);

size(probs) = (1, 3747)
Accuracy 0.0