# EE4375: Fourth Lab Session: FEM One-Dimensional Poisson Equation: Functions, StaticArrays, StructArrays, Type Stability and Benchmarking

## Import Packages  

In [54]:
using LinearAlgebra 
using SparseArrays 
using StructArrays
using StaticArrays
using StaticRanges

using IterativeSolvers
using Preconditioners

using BenchmarkTools
using Profile
using ProfileView

using Plots 

## Section 1:/ Introduction 

Here we develop a one-dimensional Galerkin finite element code that construct the coefficient matrix as <b>sparse</b> without first constructing a full variant. The code furthermore is <b>type-stable</b> and uses a number of memory <b>allocations</b> that is almost independent of the problem size. We also explore the use of [StructArrays](https://github.com/JuliaArrays/StructArrays.jl) to allows memory-contingent layout is the assembly process. We currently lack the tools to verify the efficiency of this latter tool. 

## Section 2:/ One-Dimensional Mesh Generation 

Exercise: extend to from uniform mesh to non-uniform with local refinement to capture skin effect on material interfaces.  

In [55]:
# struct to hold a single mesh element
# all the members of the struct should be concrete type 
struct Element
  p1::Float64    # coordinate left-most node
  p2::Float64    # coordinate right-most node
  e1::Int64      # global index left-most node
  e2::Int64      # global index right-most node
  area::Float64  # area of the element 
end

# struct to hold entire mesh
struct Mesh
  nnodes::Int64
  nelements::Int64 
  # specify one-dimensional array of elements as an array of structs. 
  # we worry about using structArray (if as all) later. 
  Elements::Array{Element,1}
  bndNodeIds::Vector{Int64}
  dofPerElement::Int64       
end 

In [88]:
# function to generate a mesh on the interval 0 <= x <= 1.   
# we limit the type of input to be Int64 
# when is giving output argument an explicit type important? 
function generateMesh(nelements::Int64)::Mesh
    h = 1/nelements 
    nnodes = nelements+1
    dofPerElement = 2 
    x = Vector{Float64}(0:h:1)    
    # what does the undef do here? 
    Elements = Array{Element,1}(undef,nelements)
    for i in 1:nelements
        Elements[i] = Element(x[i],x[i+1],i,i+1,x[i+1]-x[i])
    end
    mesh = Mesh(nnodes,nelements,Elements,[1,nelements+1],dofPerElement)     
    return mesh;
end 

generateMesh (generic function with 1 method)

In [89]:
# test function for correctness of output 
mesh = generateMesh(4)
typeof(mesh)

Mesh

In [90]:
# test function for type stability 
# the text under Local @3 in orange font is considered to be harmless 
#@code_warntype generateMesh(4);

In [91]:
# test function for number of allocations and CPU time
# observe how the number of memory allocations only depends midly on the mesh size 
@time generateMesh(10);
@time generateMesh(100);
@time generateMesh(1000);
@time generateMesh(10000);

  0.000006 seconds (3 allocations: 720 bytes)
  0.000007 seconds (3 allocations: 5.016 KiB)
  0.000010 seconds (4 allocations: 47.188 KiB)
  0.059853 seconds (5 allocations: 468.984 KiB, 99.88% gc time)


In [96]:
@code_lowered generateMesh(10)

CodeInfo(
[90m1 ─[39m       Core.NewvarNode(:(mesh))
[90m│  [39m %2  = Main.Mesh
[90m│  [39m       h = 1 / nelements
[90m│  [39m       nnodes = nelements + 1
[90m│  [39m       dofPerElement = 2
[90m│  [39m %6  = Core.apply_type(Main.Vector, Main.Float64)
[90m│  [39m %7  = 0:h:1
[90m│  [39m       x = (%6)(%7)
[90m│  [39m %9  = Core.apply_type(Main.Array, Main.Element, 1)
[90m│  [39m       Elements = (%9)(Main.undef, nelements)
[90m│  [39m %11 = 1:nelements
[90m│  [39m       @_3 = Base.iterate(%11)
[90m│  [39m %13 = @_3 === nothing
[90m│  [39m %14 = Base.not_int(%13)
[90m└──[39m       goto #4 if not %14
[90m2 ┄[39m %16 = @_3
[90m│  [39m       i = Core.getfield(%16, 1)
[90m│  [39m %18 = Core.getfield(%16, 2)
[90m│  [39m %19 = Base.getindex(x, i)
[90m│  [39m %20 = x
[90m│  [39m %21 = i + 1
[90m│  [39m %22 = Base.getindex(%20, %21)
[90m│  [39m %23 = i
[90m│  [39m %24 = i + 1
[90m│  [39m %25 = x
[90m│  [39m %26 = i + 1
[90m│  [39m %27 = B

In [99]:
a = 1; b = 2; 
c = @less [a b]

# This file is a part of Julia. License is MIT: https://julialang.org/license

## Basic functions ##

"""
    AbstractArray{T,N}

Supertype for `N`-dimensional arrays (or array-like types) with elements of type `T`.
[`Array`](@ref) and other types are subtypes of this. See the manual section on the
[`AbstractArray` interface](@ref man-interface-array).

See also: [`AbstractVector`](@ref), [`AbstractMatrix`](@ref), [`eltype`](@ref), [`ndims`](@ref).
"""
AbstractArray

convert(::Type{T}, a::T) where {T<:AbstractArray} = a
convert(::Type{AbstractArray{T}}, a::AbstractArray) where {T} = AbstractArray{T}(a)
convert(::Type{AbstractArray{T,N}}, a::AbstractArray{<:Any,N}) where {T,N} = AbstractArray{T,N}(a)

"""
    size(A::AbstractArray, [dim])

Return a tuple containing the dimensions of `A`. Optionally you can specify a
dimension to just get the length of that dimension.

Note that `size` may not be defined for arrays with non-standard indices, in which case [`axes`](@ref)
may be useful. Se

checkbounds_indices(::Type{Bool}, IA::Tuple, ::Tuple{}) = (@inline; all(x->length(x)==1, IA))
checkbounds_indices(::Type{Bool}, ::Tuple{}, ::Tuple{}) = true

throw_boundserror(A, I) = (@noinline; throw(BoundsError(A, I)))

# check along a single dimension
"""
    checkindex(Bool, inds::AbstractUnitRange, index)

Return `true` if the given `index` is within the bounds of
`inds`. Custom types that would like to behave as indices for all
arrays can extend this method in order to provide a specialized bounds
checking implementation.

See also [`checkbounds`](@ref).

# Examples
```jldoctest
julia> checkindex(Bool, 1:20, 8)
true

julia> checkindex(Bool, 1:20, 21)
false
```
"""
checkindex(::Type{Bool}, inds::AbstractUnitRange, i) =
    throw(ArgumentError("unable to check bounds for indices of type $(typeof(i))"))
checkindex(::Type{Bool}, inds::AbstractUnitRange, i::Real) = (first(inds) <= i) & (i <= last(inds))
checkindex(::Type{Bool}, inds::AbstractUnitRange, ::Colon) = true
checkindex(::Ty

# _sub2ind and _ind2sub
# fallbacks
function _sub2ind(A::AbstractArray, I...)
    @inline
    _sub2ind(axes(A), I...)
end

function _ind2sub(A::AbstractArray, ind)
    @inline
    _ind2sub(axes(A), ind)
end

# 0-dimensional arrays and indexing with []
_sub2ind(::Tuple{}) = 1
_sub2ind(::DimsInteger) = 1
_sub2ind(::Indices) = 1
_sub2ind(::Tuple{}, I::Integer...) = (@inline; _sub2ind_recurse((), 1, 1, I...))

# Generic cases
_sub2ind(dims::DimsInteger, I::Integer...) = (@inline; _sub2ind_recurse(dims, 1, 1, I...))
_sub2ind(inds::Indices, I::Integer...) = (@inline; _sub2ind_recurse(inds, 1, 1, I...))
# In 1d, there's a question of whether we're doing cartesian indexing
# or linear indexing. Support only the former.
_sub2ind(inds::Indices{1}, I::Integer...) =
    throw(ArgumentError("Linear indexing is not defined for one-dimensional arrays"))
_sub2ind(inds::Tuple{OneTo}, I::Integer...) = (@inline; _sub2ind_recurse(inds, 1, 1, I...)) # only OneTo is safe
_sub2ind(inds::Tuple{OneTo}, i::Inte

## Section 3/: Linear System Generation 

### Section 1.3: Coefficient Matrix Generation

Note that in the function generateMatrix() given below the variables Iloc, Jloc, Aloc do not require pre-allocation. Declaring these variables instead as static arrays appears to be sufficient to obtain a type stable function.  

In [60]:
aaa=srange(3,8)

static(3:8)

In [61]:
aaa[2]

4

In [109]:
function generateLocalMatrix(element::Element)
    h     = element.area 
    e1    = element.e1
    e2    = element.e2
    Iloc  = SVector(e1, e1, e2, e2) 
    Jloc  = SVector(e1, e2, e1, e2) 
    # Kloc: local variable to be used in the future: goes fine - *no* memory allocations
    Kloc  = SVector(1., 2., 3., 4.)
    # Lloc: another local variable to be used in the future: fials - causes memory allocations
    Lloc  = SMatrix{2,2}(1., 2., 3., 4.)  
    Aloc  = SVector(1/h, -1/h, -1/h, 1/h) 
    return Iloc, Jloc, Aloc
end

# type information missing on input element
function generateMatrix(mesh::Mesh)
    
    #..recover number of elements  
    nelements = mesh.nelements
    dofperelem = 4; 
    
    #..preallocate the memory for local matrix contributions 
    Avalues = zeros(Float64,dofperelem*nelements)
    I = zeros(Int64,length(Avalues))
    J = zeros(Int64,length(Avalues))

    for i = 1:nelements #..loop over number of elements..
        element = mesh.Elements[i]
        Iloc, Jloc, Aloc = generateLocalMatrix(element) 
        irng = mrange(4*i-3,4*i) 
        I[irng] .= Iloc 
        J[irng] .= Jloc 
        Avalues[irng] .= Aloc         
    end
    
    A = sparse(I,J,Avalues)
   
    return A; 
end

generateMatrix (generic function with 1 method)

In [110]:
mesh = generateMesh(10);
#@code_warntype generateMatrix(mesh)

In [111]:
# test function for number of allocations and CPU time
# observe how the number of memory allocations only depends midly on the mesh size
mesh = generateMesh(10);    @time generateMatrix(mesh); # force function compilation 
mesh = generateMesh(10);    @time generateMatrix(mesh); 
mesh = generateMesh(100);   @time generateMatrix(mesh);
mesh = generateMesh(1000);  @time generateMatrix(mesh); 
mesh = generateMesh(10000); @time generateMatrix(mesh); 

  0.000041 seconds (14 allocations: 3.109 KiB)
  0.000010 seconds (14 allocations: 3.109 KiB)
  0.000010 seconds (14 allocations: 23.953 KiB)
  0.000043 seconds (21 allocations: 227.844 KiB)
  0.000755 seconds (24 allocations: 2.214 MiB)


### Section 3.3: Right-Hand Side Vector Generation
Use callable-struct to pass rhs-function as argument to assembly of the rhs-vector. 

In [45]:
# callable struct allowing type-stable implementation of RHS-vector assembly 
struct SrcFunction{Float64}
    dummy::Float64
end

In [46]:
# source function attached to the callable struct 
function (scrFunction::SrcFunction)(x)
    return sin(π*x)*x 
end 

In [47]:
mySrcFunction = SrcFunction3(0.)

SrcFunction{Float64}(0.0)

In [48]:
mySrcFunction(.2)

0.11755705045849463

In [49]:
typeof(mySrcFunction)

SrcFunction{Float64}

In [50]:
function generateLocalVector(element::Element, sourceFct::SrcFunction)
    h = element.area 
    Iloc = SVector(element.e1, element.e2)
    floc = (h/2)*SVector(sourceFct(element.p1), sourceFct(element.p2))
    return Iloc, floc
end

function generateVector(mesh::Mesh, sourceFct::SrcFunction)
    
    #..recover number of elements  
    nelements = mesh.nelements 
    nnodes = mesh.nnodes 
    
    #..initialize global vector  
    f = zeros(Float64,nnodes)

    for i = 1:nelements #..loop over number of elements..
        element = mesh.Elements[i]
        Iloc, floc = generateLocalVector(element,sourceFct) 
        f[Iloc] .+= floc          
    end
   
    return f; 
end

generateVector (generic function with 1 method)

In [51]:
@code_warntype generateVector(mesh, mySrcFunction)

MethodInstance for generateVector(::Mesh, ::SrcFunction{Float64})
  from generateVector(mesh::Mesh, sourceFct::SrcFunction) in Main at In[50]:8
Arguments
  #self#[36m::Core.Const(generateVector)[39m
  mesh[36m::Mesh[39m
  sourceFct[36m::SrcFunction{Float64}[39m
Locals
  @_4[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  f[36m::Vector{Float64}[39m
  nnodes[36m::Int64[39m
  nelements[36m::Int64[39m
  @_8[36m::Int64[39m
  i[36m::Int64[39m
  floc[36m::SVector{2, Float64}[39m
  Iloc[36m::SVector{2, Int64}[39m
  element[36m::Element[39m
Body[36m::Vector{Float64}[39m
[90m1 ─[39m       (nelements = Base.getproperty(mesh, :nelements))
[90m│  [39m       (nnodes = Base.getproperty(mesh, :nnodes))
[90m│  [39m       (f = Main.zeros(Main.Float64, nnodes))
[90m│  [39m %4  = (1:nelements)[36m::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])[39m
[90m│  [39m       (@_4 = Base.iterate(%4))
[90m│  [39m %6  = (@_4 === nothing)[36m::Bool[39m

In [52]:
# observe again how the number of memory allocations only depends midly on the mesh size
mesh = generateMesh(10);    @time f = generateVector(mesh,mySrcFunction); # force function compilation
mesh = generateMesh(10);    @time f = generateVector(mesh,mySrcFunction);
mesh = generateMesh(100);   @time f = generateVector(mesh,mySrcFunction);
mesh = generateMesh(1000);  @time f = generateVector(mesh,mySrcFunction);
mesh = generateMesh(10000); @time f = generateVector(mesh,mySrcFunction);

  0.018552 seconds (69.17 k allocations: 3.521 MiB, 99.89% compilation time)
  0.000004 seconds (1 allocation: 144 bytes)
  0.000004 seconds (1 allocation: 896 bytes)
  0.000013 seconds (1 allocation: 8.000 KiB)
  0.000104 seconds (2 allocations: 78.234 KiB)


## Section 4: Solve Process 

In [None]:
N = 500 
mesh = generateMesh(N)

A = generateMatrix(mesh) # force compilation 
f = generateVector(mesh,mySrcFunction) # force compilation

function generateSolution(A,f)
    #..handle essential boundary conditions 
    A[1,1] = 1.; A[1,2] = 0.; f[1] = 0.;
    u = A\f 
    return u 
end

u = generateSolution(A,f) # force compilation 
@time u = generateSolution(A,f)

plot(u)

In [None]:
# observe again how the number of memory allocations only depends midly on the mesh size
# observe how assembly is much faster than the solve 
mesh = generateMesh(10);    A = generateMatrix(mesh); f = generateVector(mesh,mySrcFunction); @time u = generateSolution(A,f);
mesh = generateMesh(100);   A = generateMatrix(mesh); f = generateVector(mesh,mySrcFunction); @time u = generateSolution(A,f);
mesh = generateMesh(1000);  A = generateMatrix(mesh); f = generateVector(mesh,mySrcFunction); @time u = generateSolution(A,f);
mesh = generateMesh(10000); A = generateMatrix(mesh); f = generateVector(mesh,mySrcFunction); @time u = generateSolution(A,f);

## Sandbox