# Introduction to Julia

<img src="./julia_logo.png" align="center" width="400"/>

## Types of programming languages

* **Compiler languages**: C/C++, Fortran, ... 
  - Directly compiled to machine code that is executed by CPU 
  - Pros: fast, memory efficient
  - Cons: longer development time, hard to debug

* **Interpreter languages**: R, MATLAB, Python, SAS IML, JavaScript, ... 
  - Interpreted by interpreter
  - Pros: fast prototyping
  - Cons: excruciatingly slow for loops

* Mixed (dynamic) languages: Java
  - Compiled into *byte code* by the compiler, byte code is interpreted by the *virtual machine* (JVM). This scheme achieves architecture independence.
  - More and more interpreter languages are adopting JIT technology: R (version 3.4+), MATLAB (R2015b+), Python (PyPy), Julia, ...
      + functions will be compiled before execution on the first or second use. For subsequent uses (e.g., calling the function within a loop), the speedup is significant.

* Scripting languages: Linux shell scripts, Perl, ...
  - Interpreter languages quick to let computer do simple tasks.
  - Extremely useful for some data preprocessing and manipulation

* Database languages: SQL, Hive (Hadoop).  
  - Data analysis *never* happens if we do not know how to retrieve data from databases  

## Messages

* To be versatile in the big data era, master at least one language in each category.

* To improve efficiency of interpreted languages such as R or Matlab, conventional wisdom is to avoid loops as much as possible, aka, **vectorize** code
> The only loop you are allowed to have is that for an iterative algorithm.

* When looping is unavoidable, need to code in C, C++, or Fortran.  
Success stories: the popular `glmnet` package in R is coded in Fortran; `tidyverse` packages use a lot Rcpp/C++.

* Modern languages such as Julia tries to solve the **two language problem**:
    - Prototype code goes into a high-level language
    - Production code goes into a low-level language

## What's Julia?

> Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments

* History:
  - Project started in 2009. First public release in 2012 
  - Creators: Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral Shah
  - First major release v1.0 was released on Aug 8, 2018
  - Current stable release: v1.5.0

* Aim to solve the notorious **two language problem**: Prototype code goes into high-level languages like R/Python, production code goes into low-level language like C/C++. 

    Julia aims to:
> Walks like Python. Runs like C.

<img src="./julia_vs_otherlang.png" align="center" width="800"/>

See <https://julialang.org/benchmarks/> for the details of benchmark.

* Write high-level, abstract code that closely resembles mathematical formulas
    - yet produces fast, low-level machine code that has traditionally only been generated by static languages.

* Julia is more than just "Fast R" or "Fast Matlab"
    - Performance comes from features that work well together.  
    - You can't just take the magic dust that makes Julia fast and sprinkle it on [language of choice]

## R is great, but...

* The language encourages operating on the whole object (i.e. vectorized code). However, some tasks (e.g. MCMC) are not easily vectorized.

* Unvectorized R code (`for` and `while` loops) is slow.
  - http://adv-r.had.co.nz/Performance.html
    - Section on performance starts with "Why is R slow?" 

* Techniques for large data sets – parallelization, memory mapping, database access, map/reduce – can be used but not easily. R is single threaded and most likely will stay that way.

* R functions should obey functional semantics (not modify arguments). Okay until you have very large objects on which small changes are made during parameter estimation.

* Sort-of object oriented using generic functions but implementation is casual. Does garbage collection but not based on reference counting.

* The real work is done in underlying C code and it is not easy to trace your way through it.

(by [Doug Bates](http://pages.stat.wisc.edu/~bates/), member of the R Core Team, `Matrix` and `lme4`)

* Deficiencies in the core language 
  - Many fixed with packages (`devtools`, `roxygen2`, `Matrix`)
  - Others harder to fix (R uses an old version of BLAS)
  - Some impossible to fix (clunky syntax, poor design choices)
 
* Doug Bates' [Julia package for mixed-effects models](https://github.com/dmbates/MixedModels.jl)
    - Getting Doug on board was a big win for statistics with Julia, as he brought a lot of knowledge about the history of R development and design choices
    
    > As some of you may know, I have had a (rather late) mid-life crisis and run off with another language called Julia.   
    >
    > -- <cite>Doug Bates (on the [`knitr` Google Group](https://groups.google.com/forum/#!msg/knitr/F78PBMIamwk/X-d-zUhrdrkJ), 2013)</cite>

## Gibbs sampler example by Doug Bates

* An example from Doug Bates' [Julia for R Programmers](http://www.stat.wisc.edu/~bates/JuliaForRProgrammers.pdf) slides.

* The task is to create a Gibbs sampler for the density  
$$
f(x, y) = k x^2 exp(- x y^2 - y^2 + 2y - 4x), x > 0
$$
using the conditional distributions
$$
\begin{eqnarray*}
  X | Y &\sim& \Gamma \left( 3, \frac{1}{y^2 + 4} \right) \quad \text{(shape, scale)}\\
  Y | X &\sim& N \left(\frac{1}{1+x}, \frac{1}{2(1+x)} \right).
\end{eqnarray*}
$$

* R solution. The `RCall.jl` package allows us to execute R code without leaving the `Julia` environment. We first define an R function `Rgibbs()`.

In [None]:
using Pkg
Pkg.activate("../..")
Pkg.status()

In [None]:
using RCall

R"""
library(Matrix)
Rgibbs <- function(N, thin) {
  mat <- matrix(0, nrow=N, ncol=2)
  x <- y <- 0
  for (i in 1:N) {
    for (j in 1:thin) {
      x <- rgamma(1, 3, y * y + 4) # 3rd arg is rate
      y <- rnorm(1, 1 / (x + 1), 1 / sqrt(2 * (x + 1)))
    }
    mat[i,] <- c(x, y)
  }
  mat
}
"""

To generate a sample of size 10,000 with a thinning of 500. How long does it take?

In [None]:
R"""
system.time(Rgibbs(10000, 500))
"""

* This is a Julia function for the simple Gibbs sampler:

In [None]:
using Distributions

function jgibbs(N, thin)
    mat = zeros(N, 2)
    x = y = 0.0
    for i in 1:N
        for j in 1:thin
            x = rand(Gamma(3, 1 / (y * y + 4)))
            y = rand(Normal(1 / (x + 1), 1 / sqrt(2(x + 1))))
        end
        mat[i, 1] = x
        mat[i, 2] = y
    end
    mat
end

Generate the same number of samples. How long does it take?

In [None]:
jgibbs(100, 5); # warm-up
@elapsed jgibbs(10000, 500)

We see 40-80 fold speed up of `Julia` over `R` on this example, **with similar coding effort**!

## Learning resources

0. [Julia: A Fresh Approach to Numerical Computing](../../readings/BezansonEdelmanKarpinskiShah17Julia.pdf) by Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah, *SIAM REVIEW* Vol. 59, No. 1, pp. 65–98.

1. [Julia for R Programmers](http://www.stat.wisc.edu/~bates/JuliaForRProgrammers.pdf) by Doug Bates.

2. YouTube: [Intro to Julia](https://www.youtube.com/watch?v=8h8rQyEpiZA&t) (2h28m), by Jane Herriman. 

3. Cheat sheet: [The Fast Track to Julia](https://juliadocs.github.io/Julia-Cheat-Sheet/).  

4. Browse the Julia [documentation](https://docs.julialang.org/en).  

5. For R users, read [Noteworthy Differences From R](https://docs.julialang.org/en/v1/manual/noteworthy-differences/#Noteworthy-differences-from-R-1).  

    For Python users, read [Noteworthy Differences From Python](https://docs.julialang.org/en/v1/manual/noteworthy-differences/?highlight=matlab#Noteworthy-differences-from-Python-1).  

    For Matlab users, read [Noteworthy Differences From Matlab](https://docs.julialang.org/en/v1/manual/noteworthy-differences/#Noteworthy-differences-from-MATLAB-1).  


6. The [Learning page](http://julialang.org/learning/) on Julia's website has pointers to many other learning resources.  

## Julia REPL (Read-Evaluation-Print-Loop)

The `Julia` REPL, or `Julia` shell, has at least five modes.

1. **Default mode** is the Julia prompt `julia>`. *Type backspace in other modes* to return to the default mode.    

2. **Help mode** `help?>`. Type `?` to enter help mode. `?search_term` does a fuzzy search for `search_term`.  

3. **Shell mode** `shell>`. Type `;` to enter shell mode.  

4. **Package mode** `(v1.1) pkg>`. Type `]` to enter package mode for managing Julia packages (install, uninstall, update, ...).

5. **Search mode** `(reverse-i-search)`. Press `ctrl+R` to enter search model. 

6. With `RCall.jl` package installed, we can enter the **R mode** by typing `$` (shift+4) at Julia REPL.

Some survival commands in Julia REPL:  
1. `quit()` or `Ctrl+D`: exit Julia.

2. `Ctrl+C`: interrupt execution.

3. `Ctrl+L`: clear screen.

0. Append `;` (semi-colon) to suppress displaying output from a command. Same as Matlab.

0. `include("filename.jl")` to source a Julia code file.

## Seek help

* Online help from REPL: `?function_name`.

* Google (~~Naver~~).

* Julia documentation: <https://docs.julialang.org/en/>.

* Look up source code: `@edit fun(x)`.

* <https://discourse.julialang.org>.

* Friends.

## Which IDE?

* Julia homepage lists many choices: Juno, VS Code, Vim, ...

* Unfortunately at the moment there are no mature RStudio- or Matlab-like IDE for Julia yet.

* For dynamic document, e.g., homework, I recommend [Jupyter Notebook](https://jupyter.org/install.html) or [JupyterLab](http://jupyterlab.readthedocs.io/en/stable/). JupyterLab is supposed to replace Jupyter Notebook after it reaches v1.0.

* For extensive Julia coding, I myself use the [vi](https://won-j.github.io/326_621a-2018fall/lectures/02-linux/linux2.html#vi). Use whatever editor you like.


## Julia package system

* Each Julia package is a Git repository. Each Julia package name ends with `.jl`. E.g., `Distributions.jl` package lives at <https://github.com/JuliaStats/Distributions.jl>.   
Google search with `PackageName.jl` usually leads to the package on github.com. 

* The package ecosystem is rapidly maturing; a complete list of **registered** packages (which are required to have a certain level of testing and documentation) is at [http://pkg.julialang.org/](http://pkg.julialang.org/).

* For example, the package called `Distributions.jl` is added with
```julia
# in Pkg mode
(v1.1) pkg> add Distributions
```
and "removed" (although not completely deleted) with
```julia
# in Pkg mode
(v1.1) pkg> rm Distributions
```
* The package manager provides a dependency solver that determines which packages are actually required to be installed.

* **Non-registered** packages are added by cloning the relevant Git repository. E.g.,
```julia
# in Pkg mode
(v1.1) pkg> add https://github.com/OpenMendel/SnpArrays.jl
```

* A package needs only be added once, at which point it is downloaded into your local `.julia/packages` directory in your home directory. 

In [None]:
using Pkg
Pkg.activate("../..")
Pkg.dependencies()

* Directory of a specific package can be queried by `pathof()`:

In [None]:
using Distributions

pathof(Distributions)

* If you start having problems with packages that seem to be unsolvable, you may try just deleting your .julia directory and reinstalling all your packages. 

* Periodically, one should run `update` in Pkg mode, which checks for, downloads and installs updated versions of all the packages you currently have installed.

* `status` lists the status of all installed packages.

* Using functions in package.
```julia
using Distributions
```
This pulls all of the *exported* functions in the module into your local namespace, as you can check using the `whos()` command. An alternative is
```julia
import Distributions
```
Now, the functions from the Distributions package are available only using 
```julia
Distributions.<FUNNAME>
```
All functions, not only exported functions, are always available like this.

## Calling R from Julia

* The [`RCall.jl`](https://github.com/JuliaInterop/RCall.jl) package allows us to embed R code inside of Julia.

* There are also `PyCall.jl`, `MATLAB.jl`, `JavaCall.jl`, `CxxWrap.jl` packages for interfacing with other languages.

In [None]:
using RCall

x = randn(1000)
R"""
hist($x, main="I'm plotting a Julia vector")
"""

In [None]:
R"""
library(ggplot2)
qplot($x)
"""

In [None]:
x = R"""
rnorm(10)
"""

In [None]:
# collect R variable into Julia workspace
y = collect(x)

* Access Julia variables in R REPL mode:
```julia
julia> x = rand(5) # Julia variable
R> y <- $x
```

* Pass Julia expression in R REPL mode:
```julia
R> y <- $(rand(5))
```

* Put Julia variable into R environment:
```julia
julia> @rput x
R> x
```

* Get R variable into Julia environment:
```julia
R> r <- 2
Julia> @rget r
```

* If you want to call Julia within R, check out the [`XRJulia`](https://cran.r-project.org/web/packages/XRJulia/) package by John Chambers.

## Some basic Julia code

In [None]:
# an integer, same as int in R
y = 1
typeof(y) 

In [None]:
# a Float64 number, same as double in R
y = 1.0
typeof(y) 

In [None]:
# Greek letters:  `\pi<tab>`
π

In [None]:
typeof(π)

In [None]:
# Greek letters:  `\theta<tab>`
θ = y + π

In [None]:
# emoji! `\:kissing_cat:<tab>`
😽 = 5.0

In [None]:
# `\alpha<tab>\hat<tab>`
α̂ = π

In [None]:
# vector of Float64 0s
x = zeros(5)

In [None]:
# vector Int64 0s
x = zeros(Int, 5)

In [None]:
# matrix of Float64 0s
x = zeros(5, 3)

In [None]:
# matrix of Float64 1s
x = ones(5, 3)

In [None]:
# define array without initialization
x = Matrix{Float64}(undef, 5, 3)

In [None]:
# fill a matrix by 0s
fill!(x, 0)

In [None]:
x

In [None]:
# initialize an array to be constant 2.5
fill(2.5, (5, 3))

In [None]:
# rational number
a = 3//5

In [None]:
typeof(a)

In [None]:
b = 3//7

In [None]:
a + b

In [None]:
# uniform [0, 1) random numbers
x = rand(5, 3)

In [None]:
# uniform random numbers (in Float16)
x = rand(Float16, 5, 3)

In [None]:
# random numbers from {1,...,5}
x = rand(1:5, 5, 3)

In [None]:
# standard normal random numbers
x = randn(5, 3)

In [None]:
# range
1:10

In [None]:
typeof(1:10)

In [None]:
1:2:10

In [None]:
typeof(1:2:10)

In [None]:
# integers 1-10
x = collect(1:10)

In [None]:
# or equivalently
[1:10...]

In [None]:
# Float64 numbers 1-10
x = collect(1.0:10)

In [None]:
# convert to a specific type
convert(Vector{Float64}, 1:10)

## Timing and benchmark

### Julia

`@time`, `@elapsed`, `@allocated` macros:

In [None]:
using Random # standard library
Random.seed!(123) # seed
x = rand(1_000_000) # 1 million random numbers in [0, 1)

@time sum(x) # first run includes compilation time

In [None]:
@time sum(x) # no compilation time after first run

In [None]:
# just the runtime
@elapsed sum(x)

In [None]:
# just the allocation
@allocated sum(x)

Use package `BenchmarkTools.jl` for more robust benchmarking. Analog of `microbenchmark` package in R.

In [None]:
using BenchmarkTools

bm = @benchmark sum($x)  # '$' to avoid problems with globals

In [None]:
using Statistics # standard library
benchmark_result = Dict() # a dictionary to store median runtime (in milliseconds)
benchmark_result["Julia builtin"] = median(bm.times) / 1e6

### C

We would use the low-level C code as the baseline for copmarison. In Julia, we can easily run compiled C code using the `ccall` function. This is similar to `.C` in R.

In [None]:
using Libdl

C_code = """
#include <stddef.h>
double c_sum(size_t n, double *X) {
    double s = 0.0;
    for (size_t i = 0; i < n; ++i) {
        s += X[i];
    }
    return s;
}
"""

const Clib = tempname()   # make a temporary file

# compile to a shared library by piping C_code to gcc
# (works only if you have gcc installed):

open(`gcc -std=c99 -fPIC -O3 -msse3 -xc -shared -o $(Clib * "." * Libdl.dlext) -`, "w") do f
    print(f, C_code) 
end

# define a Julia function that calls the C function:
c_sum(X::Array{Float64}) = ccall(("c_sum", Clib), Float64, (Csize_t, Ptr{Float64}), length(X), X)

In [None]:
# make sure it gives same answer
c_sum(x)

In [None]:
bm = @benchmark c_sum($x)

In [None]:
# store median runtime (in milliseconds)
benchmark_result["C"] = median(bm.times) / 1e6

### R, builtin `sum`

Next we compare to the build in `sum` function in R, which is implemented using C.

In [None]:
using RCall

R"""
library(microbenchmark)
y <- $x
rbm <- microbenchmark(sum(y))
"""

In [None]:
# store median runtime (in milliseconds)
@rget rbm # dataframe
benchmark_result["R builtin"] = median(rbm[!, :time]) / 1e6

### R, handwritten loop

Handwritten loop in R is much slower.

In [None]:
using RCall

R"""
sum_r <- function(x) {
  s <- 0
  for (xi in x) {
    s <- s + xi
  }
  s
}
library(microbenchmark)
y <- $x
rbm <- microbenchmark(sum_r(y))
"""

In [None]:
# store median runtime (in milliseconds)
@rget rbm # dataframe
benchmark_result["R loop"] = median(rbm[!, :time]) / 1e6

### Python, builtin `sum`

Built in function `sum` in Python.

In [None]:
using PyCall
PyCall.pyversion

In [None]:
# get the Python built-in "sum" function:
pysum = pybuiltin("sum")
bm = @benchmark $pysum($x)

In [None]:
# store median runtime (in miliseconds)
benchmark_result["Python builtin"] = median(bm.times) / 1e6

### Python, handwritten loop

In [None]:
using PyCall

py"""
def py_sum(A):
    s = 0.0
    for a in A:
        s += a
    return s
"""

sum_py = py"py_sum"

bm = @benchmark $sum_py($x)

In [None]:
# store median runtime (in miliseconds)
benchmark_result["Python loop"] = median(bm.times) / 1e6

### Python, numpy

Numpy is the high-performance scientific computing library for Python.

In [None]:
# bring in sum function from Numpy 
numpy_sum = pyimport("numpy")."sum"

In [None]:
bm = @benchmark $numpy_sum($x)

In [None]:
# store median runtime (in miliseconds)
benchmark_result["Python numpy"] = median(bm.times) / 1e6

Numpy performance is on a par with Julia built-in `sum` function. Both are about 3 times faster than C, probably because of insufficient optimization in compliation and overhead of passing Julia objects and receiving C pointers.

### Summary

In [None]:
benchmark_result

* `C` and `R builtin` are the baseline C performance (gold standard).

* `Python builtin` and `Python loop` are 80-100 fold slower than C because the loop is interpreted.

* `R loop` is about 30 folder slower than C and indicates the performance of bytecode generated by its compiler package (turned on by default since R v3.4.0 (Apr 2017)). 

* `Julia builtin` and `Python numpy` are 3-4 fold faster than C.

## Matrices and vectors

### Dimensions

In [None]:
x = randn(5, 3)

In [None]:
size(x)

In [None]:
size(x, 1) # nrow() in R

In [None]:
size(x, 2) # ncol() in R

In [None]:
# total number of elements
length(x)

### Indexing

In [None]:
# 5 × 5 matrix of random Normal(0, 1)
x = randn(5, 5)

In [None]:
# first column
x[:, 1]

In [None]:
# first row
x[1, :]

In [None]:
# sub-array
x[1:2, 2:3]

In [None]:
# getting a subset of a matrix creates a copy, but you can also create "views"
z = view(x, 1:2, 2:3)

In [None]:
# same as
@views z = x[1:2, 2:3]

In [None]:
# change in z (view) changes x as well
z[2, 2] = 0.0
x

In [None]:
# y points to same data as x
y = x

In [None]:
# x and y point to same data
pointer(x), pointer(y)

In [None]:
# changing y also changes x
y[:, 1] .= 0  # Dot broadcasting: "vectorization" in Julia. More below
x

In [None]:
# create a new copy of data
z = copy(x)

In [None]:
pointer(x), pointer(z)  # they should be different now

In [None]:
a = 1.0  # Float64
b = a

In [None]:
a = 2.0
b

#### What's the difference?

- In Julia, everything is an object (see **Types** below). But there are *mutable* and *immutable* objects.
- In *assignment* of the form `x = ...`, the LHS is a variable name. Assignment changes which object the variable `x` refers to (called a *variable binding*). 
- After the statememt `b = a` any change to `a` also affects `b`. However, the value bound to `a` is `1.0`, an immutable value. 
- You can't mutate an immutable object. The next statement `a = 2.0` does *not* mutate the value bound to `a` (`1.0`), but create a new immutable object `2.0` and re-binds it to variable `a`.
- Binding of `b` to the previous object (`1.0`) is not affected. Hence there's no way to tell if it was copied or referenced.

In [None]:
# guess what will happen
x = randn(5, 5)
y

- On the other hand, `Array` is a mutable object.
- `y[:, 1] .= 0` is *not* an assignment, but a *mutation*.
- `x = x .+ 0.1` is an assignment, whereas `x .+= 0.1` is a mutation.

In [None]:
y = x

In [None]:
x .+= 0.1
y

In [None]:
(pointer(x), pointer(y))

In [None]:
x = x .+ 0.1
y

In [None]:
(pointer(x), pointer(y))

### Concatenate matrices

In [None]:
# 1-by-3 array
[1 2 3]

In [None]:
# 3-by-1 vector
[1, 2, 3]

In [None]:
# multiple assignment by tuple
x, y, z = randn(5, 3), randn(5, 2), randn(3, 5)

In [None]:
[x y] # 5-by-5 matrix

In [None]:
[x y; z] # 8-by-5 matrix

### Dot operation

In Julia, any function `f(x)` can be applied elementwise to an array `X` with the “dot call” syntax `f.(X)`. 

In [None]:
x = randn(5, 3)

In [None]:
y = ones(5, 3)

In [None]:
x .* y # same as x * y in R

In [None]:
x .^ (-2) # same as x^(-2) in R

In [None]:
sin.(x)  # same as sin(x) in R

### Basic linear algebra

In [None]:
x = randn(5)

In [None]:
using LinearAlgebra
# vector L2 norm
norm(x)

In [None]:
# same as
sqrt(sum(abs2, x))

In [None]:
y = randn(5) # another vector
# dot product
dot(x, y) # x' * y

In [None]:
# same as
x'y

In [None]:
x, y = randn(5, 3), randn(3, 2)
# matrix multiplication, same as %*% in R
x * y

In [None]:
x = randn(3, 3)

In [None]:
# conjugate transpose
x'

In [None]:
b = rand(3)
x'b # same as x' * b

In [None]:
# trace
tr(x)

In [None]:
det(x)

In [None]:
rank(x)

### Sparse matrices

In [None]:
using SparseArrays

# 10-by-10 sparse matrix with sparsity 0.1
X = sprandn(10, 10, .1)

Question: why do we use `SparseArrays`?

In [None]:
# convert to dense matrix; be cautious when dealing with big data
Xfull = convert(Matrix{Float64}, X)

In [None]:
# convert a dense matrix to sparse matrix
sparse(Xfull)

In [None]:
# syntax for sparse linear algebra is the same as dense linear algebra
β = ones(10)
X * β

In [None]:
# many functions apply to sparse matrices as well
sum(X)

## Control flow and loops

* if-elseif-else-end

```julia
if condition1
    # do something
elseif condition2
    # do something
else
    # do something
end
```

* `for` loop

```julia
for i in 1:10
    println(i)
end
```

* Nested `for` loop:

```julia
for i in 1:10
    for j in 1:5
        println(i * j)
    end
end
```
Same as

```julia
for i in 1:10, j in 1:5
    println(i * j)
end
```

* Exit loop:

```julia
for i in 1:10
    # do something
    if condition1
        break # skip remaining loop
    end
end
```

* Exit iteration:  

```julia
for i in 1:10
    # do something
    if condition1
        continue # skip to next iteration
    end
    # do something
end
```

## Functions 

* Function definition
```julia
function func(req1, req2; key1=dflt1, key2=dflt2)
    # do stuff
    return out1, out2, out3
end
```
    - **Required arguments** are separated with a comma and use the positional notation.  
    - **Optional arguments** need a default value in the signature.  
    - **Semicolon** is not required in function call.  
    - **return** statement is optional (value of the last expression is the return value, like R).  
    - Multiple outputs can be returned as a **tuple**, e.g., `return out1, out2, out3`.  

* In Julia, all arguments to functions are [**passed by reference**](https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_reference), in contrast to R and Matlab (which use pass by value).
    - Implication: function arguments can be **modified** inside the function.

* Function names ending with `!` indicates that function mutates at least one argument, typically the first.
```julia
sort!(x) # vs sort(x)
```

* There is a subtle binding issue (see the Indexing section above) in functions; see the "I passed an argument `x` to a function, modified it inside that function, but on the outside, the variable `x` is still unchanged. Why?" section of  https://docs.julialang.org/en/v1/manual/faq/

* Anonymous functions, e.g., `x -> x^2`, is commonly used in collection function or list comprehensions.
```julia
map(x -> x^2, y) # square each element in x
```

* Functions can be nested:

```julia
function outerfunction()
    # do some outer stuff
    function innerfunction()
        # do inner stuff
        # can access prior outer definitions
    end
    # do more outer stuff
end
```

* Functions can be vectorized using the "dot call" syntax:

In [None]:
function myfunc(x)
    return sin(x^2)
end

x = randn(5, 3)
myfunc.(x)

* **Collection function** (think this as the series of `apply` functions in R).

    Apply a function to each element of a collection:

```julia
map(f, coll) # or
map(coll) do elem
    # do stuff with elem
    # must contain return
end
```

In [None]:
map(x -> sin(x^2), x)   # same as above

In [None]:
map(x) do elem   # long version of above
    elem = elem^2
    return sin(elem)
end

In [None]:
# Mapreduce
mapreduce(x -> sin(x^2), +, x)   # mapreduce(mapper, reducer, data)

In [None]:
# same as
sum(x -> sin(x^2), x)

* List **comprehension**

In [None]:
[sin(2i + j) for i in 1:5, j in 1:3] # similar to Python

## Type system

* Every variable in Julia has a type.

* When thinking about types, think about sets.

* Everything is a subtype of the abstract type `Any`.

* An abstract type defines a set of types
    - Consider types in Julia that are a `Number`:
<img src="1280px-Type-hierarchy-for-julia-numbers.png" width="800" align="center"/>
    - source: https://en.wikibooks.org/wiki/Introducing_Julia/Types

* We can explore type hierarchy with `typeof()`, `supertype()`, and `subtypes()`.

In [None]:
typeof(1.0), typeof(1)

In [None]:
supertype(Float64)

In [None]:
subtypes(AbstractFloat)

In [None]:
# Is Float64 a subtype of AbstractFloat?
Float64 <: AbstractFloat

In [None]:
# On 64bit machine, Int == Int64
Int == Int64

In [None]:
# convert to Float64
convert(Float64, 1)

In [None]:
# same as
Float64(1)

In [None]:
# Float32 vector
x = randn(Float32, 5)

In [None]:
# convert to Float64
convert(Array{Float64}, x)

In [None]:
# same as
Float64.(x)

In [None]:
# convert Float64 to Int64
convert(Int, 1.0)

In [None]:
convert(Int, 1.5) # should use round(1.5)

In [None]:
round(Int, 1.5)

## Multiple dispatch

* [Multiple dispatch](https://en.wikipedia.org/wiki/Multiple_dispatch) is a feature of some programming languages in which a function or method can be dynamically dispatched based on the run time (dynamic) type or, in the more general case, some other attribute of more than one of its arguments.

* Multiple dispatch lies in the core of Julia design. It allows built-in and user-defined functions to be overloaded for different combinations of argument types.

* In Juila, methods belong to functions, called **generic functions**.

* Let's consider a simple "doubling" function:

In [None]:
g(x) = x + x

In [None]:
g(1.5)

This definition is too broad, since some things, e.g., strings, can't be added 

In [None]:
g("hello world")

* This definition is correct but too restrictive, since any `Number` can be added.

In [None]:
g(x::Float64) = x + x

* This definition will automatically work on the entire type tree above!

In [None]:
g(x::Number) = x + x

This is a lot nicer than 
```julia
function g(x)
    if isa(x, Number)
        return x + x
    else
        throw(ArgumentError("x should be a number"))
    end
end
```

* `methods(func)` function display all methods defined for `func`.

In [None]:
methods(g)

* When calling a function with multiple definitions, Julia will search from the narrowest signature to the broadest signature.

* `@which func(x)` marco tells which method is being used for argument signature `x`.

In [None]:
# an Int64 input
@which g(1)

In [None]:
@which g(1.0)

In [None]:
# a Vector{Float64} input
@which g(randn(5))

* R also makes use of generic functions and multiple dispatch (see http://adv-r.had.co.nz/OO-essentials.html#s3), but it is not fully optimized.

## Just-in-time compilation (JIT)

| <img src="./julia_toolchain.png" alt="Julia toolchain" style="width: 400px;"/> | <img src="./julia_introspect.png" alt="Julia toolchain" style="width: 500px;"/> |
|----------------------------------|------------------------------------|
|||

Source: [Introduction to Writing High Performance Julia](https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxibG9uem9uaWNzfGd4OjMwZjI2YTYzNDNmY2UzMmE) by Arch D. Robinson

* `Julia`'s efficiency results from its capability to infer the types of **all** variables within a function and then call LLVM (compiler) to generate optimized machine code at run-time. 

Consider the `g` (doubling) function defined earlier. This function will work on **any** type which has a method for `+`.

In [None]:
g(2), g(2.0)

**Step 1**: Parse Julia code into [abstract syntax tree (AST)](https://en.wikipedia.org/wiki/Abstract_syntax_tree).

In [None]:
@code_lowered g(2)

**Step 2**: Type inference according to input type.

In [None]:
@code_warntype g(2)

In [None]:
@code_warntype g(2.0)

**Step 3**: Compile into **LLVM bytecode** (equivalent of R bytecode generated by the compiler package).

In [None]:
@code_llvm g(2)

In [None]:
@code_llvm g(2.0)

We didn't provide a type annotation. But different LLVM code gets generated depending on the argument type!

In R or Python, `g(2)` and `g(2.0)` would use the same code for both.
 
In Julia, `g(2)` and `g(2.0)` dispatches to optimized code for `Int64` and `Float64`, respectively.

For integer input `x`, LLVM compiler is smart enough to know `x + x` is simple shifting `x` by 1 bit, which is faster than addition.
 
* **Step 4**: Lowest level is the **assembly code**, which is machine dependent.

In [None]:
@code_native g(2)

1st instruction adds the content of the general purpose 64-bit register (a small memory inside the CPU) RDI to itself, and load the result into another register RAX. The addition here is the integer arithmetic.

In [None]:
@code_native g(2.0)

In [None]:
run(`which /usr/local/bin/R`)

1st instruction adds the content of the 128-bit register XMM0 to itself, and overwrites the result into XMM0. The addition here is the floating point arithmetic and a "single instruction, multiple data" (SIMD) instruction.

## Acknowledgment

This lecture note is based on [Dr. Hua Zhou](http://hua-zhou.github.io)'s 2019 Winter Statistical Computing course notes available at <http://hua-zhou.github.io/teaching/biostatm280-2019spring/index.html>.