# Neural Networks with Flux 

Flux is a very popular package in Julia for machine learning. 

Personally, I think it's the best of both worlds with the speed of tensorflow and the nice interface provided by Keras. 

You can read more about it here: https://fluxml.ai/Flux.jl/stable/ 

In [None]:
] add Flux

In [28]:
using Flux 

## Bare-bones (Core) Flux 

In [29]:
f(x) = 3x^2 + 2x + 1;      # gradient 6x+2

In [30]:
gradient(f, 2)

(14,)

In [31]:
df(x) = gradient(f, x)[1];

In [32]:
df(2)

14

## Multiple Params for Functions 

In [33]:
f(x, y) = sum((x .- y) .^ 2);        # we'll pass it vectors 

In [34]:
w = [2, 1];
b = [2, 0]; 

gradient(f, w, b)                    # two vectors need to be given 

([0, 2], [0, -2])

In [35]:
gs = gradient(params(w, b)) do       # for these param values 
         f(w, b)                     # calculate gradient of 
     end

Grads(...)

In [36]:
gs.grads

IdDict{Any,Any} with 4 entries:
  :(Main.w) => [0, 2]
  :(Main.b) => [0, -2]
  [2, 1]    => [0, 2]
  [2, 0]    => [0, -2]

In [40]:
gs[w]

2-element Array{Int64,1}:
 0
 2

In [41]:
gs[b]

2-element Array{Int64,1}:
  0
 -2

## Basic Model for Flux 

In [42]:
W = rand(2, 5)         # Weights
b = rand(2)            # biases 

2-element Array{Float64,1}:
 0.5792834723593561
 0.6533511251836333

In [43]:
predict(x) = W*x .+ b      # forward pass, uses global scope W and b 

predict (generic function with 1 method)

In [44]:
function loss(x, y)
  ŷ = predict(x)           # write as y\hat [tab]
  sum((y .- ŷ).^2)
end

loss (generic function with 1 method)

In [45]:
x, y = rand(5), rand(2) # Dummy data

([0.6858514468292465, 0.135899214410778, 0.7533025147545231, 0.3924204473716366, 0.22413119822323835], [0.48450730477280257, 0.16463816502426232])

In [46]:
loss(x, y)       # note the value here 

3.602176949144343

In [47]:
gs = gradient(params(W, b)) do       # derivative with respect to W and b
        loss(x, y)                   # for this function 
     end 

Grads(...)

In [55]:
# We can write the above more precisely as: 
gs = gradient(() -> loss(x, y), params(W, b))

Grads(...)

In [49]:
gs.grads

IdDict{Any,Any} with 6 entries:
  :(Main.y)                 => [-1.33315, -3.55407]
  :(Main.W)                 => [0.914343 0.181174 … 0.523155 0.298801; 2.43756 …
  :(Main.b)                 => [1.33315, 3.55407]
  :(Main.x)                 => [0.575817, 3.00394, 3.56709, 2.87903, 3.21702]
  [0.205791 0.955902 … 0.1… => [0.914343 0.181174 … 0.523155 0.298801; 2.43756 …
  [0.579283, 0.653351]      => [1.33315, 3.55407]

In [56]:
W̄ = gs[W]               # \bar 

2×5 Array{Float64,2}:
 0.500959  0.0992633  0.550226  0.286631  0.16371
 1.33552   0.264628   1.46686   0.764136  0.436437

In [None]:
# Let's update weights based on these gradients 

In [57]:
W .-= 0.1 .* W̄

2×5 Array{Float64,2}:
  0.064261  0.927858  0.161026  0.0732403  -0.0380862
 -0.292485  0.411886  0.470538  0.536334    0.778801

In [58]:
loss(x, y)     # loss decreases (of course)

0.6048184065650787

In [59]:
# We can also update the bias terms 
b̄ = gs[b]
b .-= 0.1 .* b̄

2-element Array{Float64,1}:
 0.37292658357370606
 0.10322037450199653

In [60]:
loss(x, y)     # even better 

0.3245906072543485

That was a very simple single layer neural network (or perceptron) in just a few minutes from scratch. This is very similar to tensorflow but without all the messy bits. Let's build on this to get a few more layers in. 

In [61]:
W = rand(2, 5)         # Weights
b = rand(2)            # biases 

predict(x) = W*x .+ b  

function loss(x, y)
  ŷ = predict(x)           # write as y\hat [tab]
  sum((y .- ŷ).^2)
end

x, y = rand(5), rand(2) # Dummy data
loss(x, y) 

1.8604677763147346

In [80]:
gs = gradient(() -> loss(x, y), params(W, b))
W̄ = gs[W] 
W .-= 0.1 .* W̄

b̄ = gs[b]

b .-= 0.1 .* b̄
loss(x, y) 

4.3322346123682517e-19