# Setting up Julia for Multithreading

For best effect use Julia 1.2 or even better 1.3
I will use Julia 1.3-rc4 for this lecture since
multithreading support is under heavy development

In [1]:
NCPU = Sys.CPU_THREADS
using Base.Threads
@show NCPU
@show nthreads();

NCPU = 4
nthreads() = 4


```bash
export JULIA_NUM_THREADS=4
# export JULIA_NUM_THREADS=`nproc`
julia
```

In [2]:
using IJulia
for I in 0:1.0:log(2, NCPU)
    cpus = 2^round(Int, I)
    @show cpus
    installkernel("Julia ($cpus threads)", env=Dict("JULIA_NUM_THREADS"=>"$cpus"))
end
# Now restart Jupyter and switch kernel

cpus = 1


┌ Info: Installing Julia (1 threads) kernelspec in /home/vchuravy/.local/share/jupyter/kernels/julia-(1-threads)-1.3
└ @ IJulia /home/vchuravy/.julia/packages/IJulia/gI2uA/deps/kspec.jl:72


cpus = 2
cpus = 4


┌ Info: Installing Julia (2 threads) kernelspec in /home/vchuravy/.local/share/jupyter/kernels/julia-(2-threads)-1.3
└ @ IJulia /home/vchuravy/.julia/packages/IJulia/gI2uA/deps/kspec.jl:72
┌ Info: Installing Julia (4 threads) kernelspec in /home/vchuravy/.local/share/jupyter/kernels/julia-(4-threads)-1.3
└ @ IJulia /home/vchuravy/.julia/packages/IJulia/gI2uA/deps/kspec.jl:72


## Your hardware (on linux)

In [14]:
;lscpu

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              2
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           142
Model name:                      Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz
Stepping:                        9
CPU MHz:                         3438.326
CPU max MHz:                     4000.0000
CPU min MHz:                     400.0000
BogoMIPS:                        4993.00
Virtualization:                  VT-x
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        512 KiB
L3 cache:                       

```julia
det!(A) = det(lufact!(A))
det!(A) = det(LinearAlgebra.generic_lufact!(A))
```

`det!` originally was calling a `lufact!` from LAPACK,
which is overkill for the matrix size. First attempt switch to a pure Julia implementation.

In [None]:
using BenchmarkTools
using StaticArrays
using Base.Threads
println("Number of threads: ", nthreads())

In [None]:
function myfun(rng::MersenneTwister)
    s = 0.0
    N = 10000
    for i=1:N
        s += det(randn(rng, SMatrix{3, 3}))
    end
    s/N
end

In [None]:
rgi   = [MersenneTwister(abs(rand(Int))) for s in 1:nthreads()]

function bench(rgi)
    a  = zeros(1000)
    @threads for i=1:length(a)
        @inbounds a[i] = myfun(rgi[threadid()])
    end
end

In [46]:
result = @benchmark bench($rgi)
display(result)

BenchmarkTools.Trial: 
  memory estimate:  7.98 KiB
  allocs estimate:  2
  --------------
  minimum time:     553.192 ms (0.00% GC)
  median time:      748.534 ms (0.00% GC)
  mean time:        741.813 ms (0.00% GC)
  maximum time:     910.000 ms (0.00% GC)
  --------------
  samples:          7
  evals/sample:     1

Number of threads: 1


### The future is near! Partr is coming
- Julia 1.2 and 1.3
- https://github.com/JuliaLang/julia/pull/32600
- https://github.com/JuliaLang/julia/pull/32477
- https://github.com/NHDaly/CspExamples.jl/blob/master/src/CspExamples.jl

In [17]:
macro par(expr)
    thunk = esc(:(()->($expr)))
    quote
        local task = Task($thunk)
        task.sticky = false
        schedule(task)
        task
    end
end

@par (macro with 1 method)

In [56]:
@par println("Hello!")

ErrorException: type Task has no field sticky

In Julia 1.3 task can be executed on multiple worker-threads allowing fine-grained control. This is concurrency ala Go/CSP.

Our handy trick from above can then simply be written as:

```julia
tasks = Task[]
for tid in 1:batches:workitems
    task = @par begin
    ### some work on batch
    end
    push!(tasks, task)
end
for task in tasks
    wait(task)
end
```

Channel{Int64}(sz_max:1,sz_curr:0)

In [109]:
?take!

search: [0m[1mt[22m[0m[1ma[22m[0m[1mk[22m[0m[1me[22m[0m[1m![22m S[0m[1mt[22m[0m[1ma[22mc[0m[1mk[22mOv[0m[1me[22mrflowError is[0m[1mt[22m[0m[1ma[22ms[0m[1mk[22mdon[0m[1me[22m s[0m[1mt[22m[0m[1ma[22mc[0m[1mk[22mtrac[0m[1me[22m S[0m[1mt[22m[0m[1ma[22mc[0m[1mk[22mTrac[0m[1me[22ms is[0m[1mt[22m[0m[1ma[22ms[0m[1mk[22mstart[0m[1me[22md



```
take!(b::IOBuffer)
```

Obtain the contents of an `IOBuffer` as an array, without copying. Afterwards, the `IOBuffer` is reset to its initial state.

# Examples

```jldoctest
julia> io = IOBuffer();

julia> write(io, "JuliaLang is a GitHub organization.", " It has many members.")
56

julia> String(take!(io))
"JuliaLang is a GitHub organization. It has many members."
```

---

```
take!(c::Channel)
```

Remove and return a value from a [`Channel`](@ref). Blocks until data is available.

For unbuffered channels, blocks until a [`put!`](@ref) is performed by a different task.

---

```
take!(rr::RemoteChannel, args...)
```

Fetch value(s) from a [`RemoteChannel`](@ref) `rr`, removing the value(s) in the process.


In [None]:
ch = Channel{Int64}(1)
ch2 = Channel{Int64}(1)
@sync begin
  @async begin
        while isopen(ch2) 
            @show take!(ch)
        end
    end
  @async begin
        for i in 1:10
            put!(ch, i)
        end
        while !isempty(ch)
            yield()
        end
        close(ch)
    end
end
        

take!(ch) = 1
take!(ch) = 2
take!(ch) = 3
take!(ch) = 4
take!(ch) = 5
take!(ch) = 6
take!(ch) = 7
take!(ch) = 8
take!(ch) = 9


In [None]:
isempty(ch)