## Setup
To use this notebook, run the following to install the dependencies. Note, the first time you use anything (library, function, etc), Julia compiles it. This means that writing `using CUDA` takes ~1m42s on my workstation the first time I run it. Be warned!

In [28]:
using Pkg; Pkg.add("CUDA"); Pkg.add("IntervalOptimisation"); Pkg.add("Symbolics");

[32m[1m   Resolving[22m[39m package versions...


[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.8/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.8/Manifest.toml`


[32m[1m   Resolving[22m[39m package versions...


[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.8/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.8/Manifest.toml`


[32m[1m   Resolving[22m[39m package versions...


[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.8/Project.toml`
[32m[1m  No Changes[22m[39m to `~/.julia/environments/v1.8/Manifest.toml`


## First adventures

Julia is like python but also like C++. It also has tons of built in math, like numpy-style array operations are in by default

In [17]:
#An example of array arithmetic
v = [1.0, 2,3]
v+v

3-element Vector{Float64}:
 2.0
 4.0
 6.0

Note that Julia determined the type (Float64) from the literal's used (in this case the 1.0). It naturally has mathematical operations defined too, like vector addition. I can even do a dot product   

In [4]:
using LinearAlgebra
dot(v,v)

14.0

This isn't sexy enough. Write `v ` then `\cdot` and press tab to autocomplete (yes latex is the way these are defined), then put `v` and you get a dot product 

In [18]:
v ⋅ v

14.0

That's right, we have unicode, and a lot of it is plumbed into real mathematical definitions. Lets make a function, one that uses Julia's cool implicit multiply

In [62]:
f(x) = 2 + 5x

f (generic function with 1 method)

What is f?

In [64]:
code_lowered(f)

1-element Vector{Core.CodeInfo}:
 CodeInfo(
[90m1 ─[39m %1 = 5 * x
[90m│  [39m %2 = 2 + %1
[90m└──[39m      return %2
)

Cool! I can see the code! Hmm, looks a little generic and slow. What if I put the type information in, do I see more code?

In [67]:
code_typed(f, (Float64,))

1-element Vector{Any}:
 CodeInfo(
[90m1 ─[39m %1 = Base.mul_float(5.0, x)[36m::Float64[39m
[90m│  [39m %2 = Base.add_float(2.0, %1)[36m::Float64[39m
[90m└──[39m      return %2
) => Float64

Its compiling to some specialised operations for floats where its tracking the types, the values, and the arguments. Do we hit bare metal?

In [70]:
code_native(f, (Float64,))

	[0m.text
	[0m.file	[0m"f"
	[0m.section	[0m.rodata.cst8[0m,[0m"aM"[0m,[0m@progbits[0m,[33m8[39m
	[0m.p2align	[33m3[39m                               [90m# -- Begin function julia_f_8971[39m
[91m.LCPI0_0:[39m
	[0m.quad	[33m0x4014000000000000[39m              [90m# double 5[39m
[91m.LCPI0_1:[39m
	[0m.quad	[33m0x4000000000000000[39m              [90m# double 2[39m
	[0m.text
	[0m.globl	[0mjulia_f_8971
	[0m.p2align	[33m4[39m[0m, [33m0x90[39m
	[0m.type	[0mjulia_f_8971[0m,[0m@function
[91mjulia_f_8971:[39m                           [90m# @julia_f_8971[39m
[90m; ┌ @ /home/mjki2mb2/testJulia/test.ipynb:1 within `f`[39m
	[0m.cfi_startproc
[90m# %bb.0:                                # %top[39m
	[96m[1mmovabsq[22m[39m	[93m$.LCPI0_0[39m[0m, [0m%rax
[90m; │┌ @ promotion.jl:389 within `*` @ float.jl:385[39m
	[96m[1mvmulsd[22m[39m	[33m([39m[0m%rax[33m)[39m[0m, [0m%xmm0[0m, [0m%xmm0
	[96m[1mmovabsq[22m[39m	[93m$.LCPI0_1[3

Awesome, assembly! All created whenever we use the function, but not before! So we always get compiled, fully optimised code. Wait, if we write abstract functions, that then get adapted to native code only when run, can we run on non-native devices?
# CUDA
Yes, just change the type so the compiler can infer we run it elsewhere.

In [12]:
using CUDA
gpu_v = CuArray([1,2,3,4,5,6,7,8]) #This array now lives on the GPU

#Some generic function using the . to make .+ an elementwise addition
f(x) = x .+ 12 

f(gpu_v)

8-element CuArray{Int64, 1, CUDA.Mem.DeviceBuffer}:
 13
 14
 15
 16
 17
 18
 19
 20

I gotta see what this looks like in GPU assembly (warning, its filled with inlined code from the library)

In [25]:
using InteractiveUtils #We use the macro @code_native from here as its a bit easier to specify the function specialisation by a variable/call
@code_native f(gpu_v)

	[0m.text
	[0m.file	[0m"f"
	[0m.section	[0m.rodata.cst16[0m,[0m"aM"[0m,[0m@progbits[0m,[33m16[39m
	[0m.p2align	[33m4[39m                               [90m# -- Begin function julia_f_5246[39m
[91m.LCPI0_0:[39m
	[0m.quad	[33m1[39m                               [90m# 0x1[39m
	[0m.quad	[33m12[39m                              [90m# 0xc[39m
	[0m.text
	[0m.globl	[0mjulia_f_5246
	[0m.p2align	[33m4[39m[0m, [33m0x90[39m
	[0m.type	[0mjulia_f_5246[0m,[0m@function
[91mjulia_f_5246:[39m                           [90m# @julia_f_5246[39m
[90m; ┌ @ /home/mjki2mb2/testJulia/test.ipynb:5 within `f`[39m
	[0m.cfi_startproc
[90m# %bb.0:                                # %top[39m
	[96m[1mpushq[22m[39m	[0m%rbp
	[0m.cfi_def_cfa_offset [33m16[39m
	[0m.cfi_offset [0m%rbp[0m, [33m-16[39m
	[96m[1mmovq[22m[39m	[0m%rsp[0m, [0m%rbp
	[0m.cfi_def_cfa_register [0m%rbp
	[96m[1mpushq[22m[39m	[0m%r15
	[96m[1mpushq[22m[39m	[0m%r14
	[96m[1mp


[90m; │││││┌ @ /home/mjki2mb2/.julia/packages/CUDA/DfvRa/src/array.jl:19 within `ArrayStorage`[39m
[90m; ││││││┌ @ atomics.jl:78 within `Atomic`[39m
	[96m[1mmovq[22m[39m	[33m16[39m[33m([39m[0m%r13[33m)[39m[0m, [0m%rdi
	[96m[1mmovabsq[22m[39m	[93m$ijl_gc_pool_alloc[39m[0m, [0m%r12
	[96m[1mmovl[22m[39m	[33m$1392[39m[0m, [0m%esi                     [90m# imm = 0x570[39m
	[96m[1mmovl[22m[39m	[33m$16[39m[0m, [0m%edx
	[96m[1mcallq[22m[39m	[0m*[0m%r12
	[96m[1mmovq[22m[39m	[0m%rax[0m, [0m%rbx
	[96m[1mleaq[22m[39m	[33m712144[39m[33m([39m[0m%r14[33m)[39m[0m, [0m%rax
	[96m[1mmovq[22m[39m	[0m%rax[0m, [33m-8[39m[33m([39m[0m%rbx[33m)[39m
	[96m[1mmovq[22m[39m	[33m$1[39m[0m, [33m([39m[0m%rbx[33m)[39m
	[96m[1mmovq[22m[39m	[0m%rbx[0m, [33m144[39m[33m([39m[0m%rsp[33m)[39m
[90m; │││││└└[39m
[90m; │││││ @ /home/mjki2mb2/.julia/packages/CUDA/DfvRa/src/array.jl:136 within `CuArray` @ /home/mjki2mb2/

[33m$9223372036854775807[39m[0m, [0m%r15      [90m# imm = 0x7FFFFFFFFFFFFFFF[39m
[90m; │││││└[39m
	[96m[1mmovq[22m[39m	[0m%rbx[0m, [33m368[39m[33m([39m[0m%rsp[33m)[39m
	[96m[1msetne[22m[39m	[33m376[39m[33m([39m[0m%rsp[33m)[39m
[90m; │││││┌ @ boot.jl:603 within `NamedTuple` @ boot.jl:607[39m
	[96m[1mmovq[22m[39m	[0m%r15[0m, [33m208[39m[33m([39m[0m%rsp[33m)[39m
	[96m[1mmovq[22m[39m	[0m%rbx[0m, [33m128[39m[33m([39m[0m%rsp[33m)[39m
	[96m[1mmovabsq[22m[39m	[93m$.LCPI0_0[39m[0m, [0m%rax
[90m; │││││└[39m
	[96m[1mvmovaps[22m[39m	[33m([39m[0m%rax[33m)[39m[0m, [0m%xmm0
	[96m[1mvmovups[22m[39m	[0m%xmm0[0m, [33m384[39m[33m([39m[0m%rsp[33m)[39m
	[96m[1mmovq[22m[39m	[33m56[39m[33m([39m[0m%rsp[33m)[39m[0m, [0m%rax                  [90m# 8-byte Reload[39m
	[96m[1mmovq[22m[39m	[0m%rax[0m, [33m400[39m[33m([39m[0m%rsp[33m)[39m
	[96m[1mmovq[22m[39m	[0m%rbx[0m, [33m40[39m[33m(

[0m%rdi
	[96m[1mcallq[22m[39m	[0m*[0m%rbx
	[96m[1mmovq[22m[39m	[0m%rax[0m, [0m%r13
	[96m[1mmovq[22m[39m	[0m%rax[0m, [33m160[39m[33m([39m[0m%rsp[33m)[39m
	[96m[1mmovq[22m[39m	[33m16[39m[33m([39m[0m%rsp[33m)[39m[0m, [0m%rbx                  [90m# 8-byte Reload[39m
	[96m[1mmovq[22m[39m	[33m16[39m[33m([39m[0m%rbx[33m)[39m[0m, [0m%rdi
	[96m[1mmovl[22m[39m	[33m$1440[39m[0m, [0m%esi                     [90m# imm = 0x5A0[39m
	[96m[1mmovl[22m[39m	[33m$32[39m[0m, [0m%edx
	[96m[1mmovabsq[22m[39m	[93m$ijl_gc_pool_alloc[39m[0m, [0m%r15
	[96m[1mcallq[22m[39m	[0m*[0m%r15
	[96m[1mmovq[22m[39m	[0m%r15[0m, [0m%rcx
	[96m[1mmovq[22m[39m	[0m%rax[0m, [0m%r15
	[96m[1mmovabsq[22m[39m	[33m$140561802599280[39m[0m, [0m%r12          [90m# imm = 0x7FD718501B70[39m
	[96m[1mleaq[22m[39m	[33m-446699264[39m[33m([39m[0m%r12[33m)[39m[0m, [0m%rax
	[96m[1mmovq[22m[39m	[0m%rax[0m, [33m-8[39m[


[90m; │││││││││││││││┌ @ /home/mjki2mb2/.julia/packages/CUDA/DfvRa/src/array.jl:276 within `pointer`[39m
[90m; ││││││││││││││││┌ @ /home/mjki2mb2/.julia/packages/CUDA/DfvRa/src/array.jl:321 within `unsafe_convert`[39m
	[96m[1mmovabsq[22m[39m	[93m$j_getproperty_5251[39m[0m, [0m%rax
	[96m[1mmovabsq[22m[39m	[33m$140566802556928[39m[0m, [0m%rdi          [90m# imm = 0x7FD842556800[39m
	[96m[1mcallq[22m[39m	[0m*[0m%rax
	[96m[1mud2[22m[39m
[91m.LBB0_22:[39m                               [90m# %L121[39m
	[96m[1mmovabsq[22m[39m	[93m$ijl_throw[39m[0m, [0m%rax
	[96m[1mmovabsq[22m[39m	[33m$140566773045744[39m[0m, [0m%rdi          [90m# imm = 0x7FD8409319F0[39m
	[96m[1mcallq[22m[39m	[0m*[0m%rax
[91m.LBB0_23:[39m                               [90m# %L133[39m
[90m; ││││││││││││││└└└[39m
[90m; ││││││││││││││ @ /home/mjki2mb2/.julia/packages/Adapt/LAQOx/src/base.jl:30 within `adapt_structure`[39m
[90m; ││││││││││││││┌ @ /home/mjki2mb

# Symbolics

In [31]:
using Symbolics

@variables x, y, t

x^2 + y^2

x^2 + y^2

How about discovering the governing differential equations behind data https://github.com/SciML/DataDrivenDiffEq.jl?
Global non-linear constrained optimisation (for small enough dimensionality) https://github.com/PSORLab/EAGO.jl