# Additional Information on Automatic Differentiation

## Import Packages 

In [3]:
using ForwardDiff
using QuadGK
using HCubature
using StaticArrays 
using LinearAlgebra
using BenchmarkTools
using Plots 

In [4]:
const Point3D = SVector{3,Float64}

SVector{3, Float64}[90m (alias for [39m[90mSArray{Tuple{3}, Float64, 1, 3}[39m[90m)[39m

## Section 1: Introduction 
Our goal is to construct the local MoM matrix using automatic differentiation assuming that magnetization only has one component. 

### To do

1. define reference triangular element;
2. study coordinate transformation from reference triangle to reference square; 
3. define the 3 nodal basis functions;
4. define the x-component of the magnetization as a linear combination of basis functions; 
5. define integrand as kernel times basis-function-1 times basis-function-2; 
6. define residual 3-vector by integration; 
7. define 3-by-3 MoM matrix by computing the Jacobian of the residual vector; 

## Section 2: Definition of the Integrand 

In [5]:
# failing version 1-of-3 - the version below *allocates* three times 
# we assume (?) that one allocation is due to storage of intermediate r - rp (p for prime)
# we fail to understand what causes the other 2 allocations  
# observe the use of the in-place normalize!() (with a bang!) function 
# the in-place normalize!() does not accept SVectors as input. The type is therefore casted to SizedArray.  
r = Point3D(0.,0,0); rp = Point3D(5.,0,0)
@btime normalize!(SizedArray(r-rp))

  183.455 ns (3 allocations: 128 bytes)


3-element SizedVector{3, Float64, Vector{Float64}} with indices SOneTo(3):
 -1.0
  0.0
  0.0

In [6]:
# failing version 2-of-3 - the version below *allocates* twice  
# we again assume that one allocation is due to storage of intermediate r - rp
# we fail to understand why the change from SizedArray to MArray lowers the allocation count by one 
# observe the use of the in-place normalize!() (with a bang!) function 
# the in-place normalize!() does not accept SVectors as input. The type is therefore casted to MArray.  
r = Point3D(0.,0,0); rp = Point3D(5.,0,0)
@btime normalize!(MArray(r-rp))

  152.879 ns (2 allocations: 64 bytes)


3-element MVector{3, Float64} with indices SOneTo(3):
 -1.0
  0.0
  0.0

In [7]:
# failing version 3-of-3 - the version below allocates once 
# allocation occurs even though the construction of the intermediate r - rp is placed *outside* 
# of the profiling
# observe the use of the normalize() (without a bang) function
r = Point3D(0.,0,0); rp = Point3D(5.,0,0)
diff = r-rp
@btime normalize(diff)

  47.902 ns (1 allocation: 32 bytes)


3-element SVector{3, Float64} with indices SOneTo(3):
 -1.0
  0.0
  0.0

In [8]:
# partially working version - the version below does *not* allocate (hurray!)
# observe the use of inplace normalize!()
# works for both the MArray and SizedArray 
# the allocation required to compute the difference vector is not taken in account in profiling
# this is probably fine - we expect to allocate the memory for the difference vector once only. 
r = Point3D(0.,0,0); rp = Point3D(5.,0,0)
diff = MArray(r-rp)
diff = SizedArray(r - rp) 
@btime normalize!(diff)

  40.783 ns (0 allocations: 0 bytes)


3-element SizedVector{3, Float64, Vector{Float64}} with indices SOneTo(3):
 -1.0
  0.0
  0.0

##  Section 3: Computing the Integral using hcubature 
This section was developed given the input on discourse at [avoiding-allocations-when-normalizing-a-vector](https://discourse.julialang.org/t/avoiding-allocations-when-normalizing-a-vector/113913/6). 

In normal usage, hcubature(...) will allocate a buffer for internal computations. You can instead pass a preallocated buffer allocated using [hcubature_buffer](@ref) as thebuffer argument. This buffer can be used across multiple calls to avoid repeated allocation.

In [9]:
# define the integrand - p stands for prime 
function integrand(r,rp)
    return normalize(r-rp) 
end

integrand (generic function with 1 method)

In [10]:
# evaluate the integrand 
r = Point3D(0.,0,0); rp = Point3D(5.,0,0)
@btime integrand(r,rp)

  49.131 ns (1 allocation: 32 bytes)


3-element SVector{3, Float64} with indices SOneTo(3):
 -1.0
  0.0
  0.0

In [13]:
?hcubature

search: [0m[1mh[22m[0m[1mc[22m[0m[1mu[22m[0m[1mb[22m[0m[1ma[22m[0m[1mt[22m[0m[1mu[22m[0m[1mr[22m[0m[1me[22m [0m[1mH[22m[0m[1mC[22m[0m[1mu[22m[0m[1mb[22m[0m[1ma[22m[0m[1mt[22m[0m[1mu[22m[0m[1mr[22m[0m[1me[22m [0m[1mh[22m[0m[1mc[22m[0m[1mu[22m[0m[1mb[22m[0m[1ma[22m[0m[1mt[22m[0m[1mu[22m[0m[1mr[22m[0m[1me[22m_buffer



```
hcubature(f, a, b; norm=norm, rtol=sqrt(eps), atol=0, maxevals=typemax(Int),
initdiv=1, buffer=nothing)
```

Compute the n-dimensional integral of f(x), where `n == length(a) == length(b)`, over the hypercube whose corners are given by the vectors (or tuples) `a` and `b`. That is, dimension `x[i]` is integrated from `a[i]` to `b[i]`.  The return value of `hcubature` is a tuple `(I, E)` of the estimated integral `I` and an estimated error `E`.

`f` should be a function `f(x)` that takes an n-dimensional vector `x` and returns the integrand at `x`.   The integrand can be any type that supports `+`, `-`, `*` real, and `norm` functions.  For example, the integrand can be real or complex numbers, vectors, matrices, etcetera.

The integrand `f(x)` will be always be passed an `SVector{n,T}`, where `SVector` is an efficient vector type defined in the `StaticArrays` package and `T` is a floating-point type determined by promoting the endpoint `a` and `b` coordinates to a floating-point type. (Your integrand `f` should be type-stable: it should always return a value of the same type, given this type of `x`.)

The integrand will never be evaluated exactly at the boundaries of the integration volume.  (So, for example, it is possible to have an integrand that blows up at the boundaries, as long as the integral is finite, though such singularities will slow convergence.)

The integration volume is adaptively subdivided, using a cubature rule due to Genz and Malik (1980), until the estimated error `E` satisfies `E ≤ max(rtol*norm(I), atol)`, i.e. `rtol` and `atol` are the relative and absolute tolerances requested, respectively. It also stops if the number of `f` evaluations exceeds `maxevals`. If neither `atol` nor `rtol` are specified, the default `rtol` is the square root of the precision `eps(T)` of the coordinate type `T` described above. Initially, the volume is divided into `initdiv` segments along each dimension.

The error is estimated by `norm(I - I′)`, where `I′` is an alternative estimated integral (via an "embedded" lower-order cubature rule.) By default, the norm function used (for both this and the convergence test above) is `norm`, but you can pass an alternative norm by the `norm` keyword argument.  (This is especially useful when `f` returns a vector of integrands with different scalings.)

In normal usage, `hcubature(...)` will allocate a buffer for internal computations. You can instead pass a preallocated buffer allocated using [`hcubature_buffer'](@ref) as the `buffer` argument. This buffer can be used across multiple calls to avoid repeated allocation.


In [12]:
# avoid global variables in benchmarking 
foo(r) = hcubature(rp->integrand(r, Point3D(rp[1],rp[2],0)), (0,0), (1,1))[1]
@btime foo($(Point3D(0,0,0)));

  69.541 μs (5 allocations: 65.80 KiB)


In [66]:
function integrand(r,rp)
    return normalize(r-rp)
end

r = Point3D(0.,0,0); rp = Point3D(5.,0,0)

@btime hcubature($(rp->integrand(r, Point3D(rp[1],rp[2],0))), $(0,0), $(1,1))[1]

  2.003 ms (59838 allocations: 2.02 MiB)


3-element SVector{3, Float64} with indices SOneTo(3):
 -0.6477935749432309
 -0.6477935748958243
  0.0