# Performance optimisation in Julia

We will cover:
- profiling
- run-time classes
- code tuning

<div class="alert alert-block alert-info" title="Aside 1">

These slides are a [Jupyter notebook](https://jupyter.org/); a browser-based computational notebook.

Code cells are executed by putting the cursor into the cell and hitting `shift + enter`.  For more
info see the [documentation](https://jupyter-notebook.readthedocs.io/en/stable/).

</div>

## Profiling

When you want to improve the performance of a program, the first important thing to do is to find out which part of your program is taking the most time to execute. This can be done by profiling. Profiling is also important when comparing the performance of two different implementations of the same task.

Very simple profiling can be done using the `@time` macro, which simply prints out how long a given statement took to execute. More advanced analysis is possible with `@profile`, which gives an insight into which part of the code required how much processing time.

### Exercise 1: converting to Celsius

Imagine you are working with a data source that uses Fahrenheit, but your calculations require degrees Celsius. How do you best convert a large number of values from one to the other?

In [36]:
using Random, Profile

## Option A: the "naive" function using a for loop and building up the result vector as we go
function converttocelsiusA(fahrenheit::Vector)
    celsius = []
    for f in fahrenheit
        push!(celsius, (f-32)*(5/9))
    end
    celsius
end

## Option B: using an anonymous function and `map`
function converttocelsiusB(fahrenheit::Vector)
    celsius = f -> (f-32)*(5/9)
    map(celsius, fahrenheit)
end

## create a random vector of 1 million numbers
fahrenheits = rand(0:250, 1000000)

## run the functions once to make sure they are compiled
## (the first execution is always much slower!)
converttocelsiusA(fahrenheits);
converttocelsiusB(fahrenheits);

## Note: adding a semi-colon after a line of code suppresses
## the REPL output - in this case, we don't want to see the array
## that is produced, we're just interested in the runtime

## then use the @time macro to find out how efficient each function is
@time(converttocelsiusA(fahrenheits));
@time(converttocelsiusB(fahrenheits));

  0.019291 seconds (1.00 M allocations: 25.040 MiB, 15.71% gc time)
  0.001368 seconds (2 allocations: 7.629 MiB)


Run the code above several times to get a feel for how the execution times change, as there is always some random fluctuation. (Also, the first run is always slower because it includes the compilation time.) Still, you should see that the second option is a lot faster.

However, vectorised code can be harder to write and understand than code that uses loops. So is it possible to speed up the first function? Let's take a look at this adapted version:

In [47]:
## Option C / A2: uses a loop like the first function, but preallocates the vector to avoid using push!()
function converttocelsiusA2(fahrenheit::Vector)
    celsius = Vector{Float64}(undef, length(fahrenheit))
    for f in eachindex(fahrenheit)
        celsius[f] = (fahrenheit[f]-32)*(5/9)
    end
    celsius
end

converttocelsiusA2(fahrenheits);

@time(converttocelsiusA2(fahrenheits));

  0.001180 seconds (2 allocations: 7.629 MiB)


Avoiding the allocations has allowed us to speed up the loop to a similar level as the vectorised code.

### Exercise 2: Plant reproduction

The `@profile` macro works by taking snapshots of the execution of your code every few milliseconds. At every snapshot, it records which function is currently executing, and which function(s) called this function. When your code is done, you thus get an insight into how much time the computer spends executing each subfunction.

Consider this mini-model of a plant population, in which only the largest plants can reproduce. Which part of the model is more computationally expensive - finding the largest plants or letting them reproduce? Which parameters influence this?

*(As before, execute this code several times to allow for compilation.)*

In [60]:
## A simple plant type
struct Plant
    id::Int
    size::Int
    seeds::Int
end

## Initialise 10 million plants
biome = [Plant(p, rand(1:10000), 1000) for p in 1:10000000]

## Return the indices of the largest plants
function findlargest(plants::Vector{Plant})
    maxsize = 0
    ids = []
    for p in eachindex(plants)
        if plants[p].size > maxsize
            maxsize = plants[p].size
            ids = [p]
        elseif plants[p].size == maxsize
            push!(ids, p)
        end
    end
    return ids
end

## Return a vector filled with one new plant for every seed produced by the input plants
function reproduce!(plants::Vector{Plant})
    offspring = []
    for p in plants
        for s in 1:p.seeds
            push!(offspring, Plant(length(plants)+length(offspring), p.size, p.seeds))
        end
    end
    offspring
end

## Find the largest plants and let only these reproduce
function updateplants(plants::Vector{Plant})
    largest = findlargest(plants)
    reproduce!(plants[largest])
end

## Finally, profile our mini-model and show the results
Profile.clear()
@profile updateplants(biome);
Profile.print(format=:flat)

 Count  Overhead File                    Line Function
     1         1 In[60]                     ? findlargest(plants::Vector{Plant})
    12         0 In[60]                    16 findlargest(plants::Vector{Plant})
     3         3 In[60]                    19 findlargest(plants::Vector{Plant})
    14         0 In[60]                    31 reproduce!(plants::Vector{Plant})
    16         0 In[60]                    39 updateplants(plants::Vector{Plant…
    14         0 In[60]                    40 updateplants(plants::Vector{Plant…
     8         8 @Base/array.jl          1026 __inbounds_setindex!
     5         5 @Base/array.jl          1072 _growend!
     1         0 @Base/array.jl           411 copy
     5         0 @Base/array.jl          1126 push!
     8         0 @Base/array.jl          1127 push!
    73        41 @Base/boot.jl            385 eval
     1         0 …ractinterpretation.jl  2162 abstract_call(interp::Core.Compil…
     1         0 …ractinterpretation.jl  2169 abst

# Run-time classes

*TODO*

# Code tuning

*TODO*