# Benchmarking Experiments

Testing different parallelization schemes.

In [1]:
using BenchmarkTools

using DRRs

In [2]:
# Load the volume and get coordinate spacing as a CT object
ct = read_dicom("../data/cxr")

# Create the camera and detector plane
# We assume the volume is centered at (0,0,0)
# Top corner is (360,360,330)
center = [-10, -10, -10]
camera = Camera(center)

center = [400, 400, 400]
height, width = 301, 301
Δx, Δy = 1, 1
detector = Detector(center, height, width, Δx, Δy);

In [3]:
# Make the rays in the projector
projector = @btime make_xrays(camera, detector);

  2.632 ms (453021 allocations: 17.97 MiB)


## Benchmark a single ray

In [4]:
# Time a single ray trace
@btime siddon(projector[1], ct);

  49.375 μs (2829 allocations: 64.62 KiB)


## Benchmark basic `for` loops

In [5]:
# Time iterating over all rays
@btime [siddon(ray, ct) for ray in projector];

  6.966 s (306933421 allocations: 6.55 GiB)


In [6]:
# Time broadcasting
import DRRs.siddon

siddon(ray; ct) = siddon(ray, ct)

@btime siddon.(projector; ct);

  7.030 s (306933447 allocations: 6.55 GiB)


In [7]:
# Time a regular old for loop
@btime for ray in projector
    siddon(ray, ct)
end

  7.021 s (307114108 allocations: 6.56 GiB)


## Benchmark parallelization

Try the following: 
1. `Base.Threads`: set `export JULIA_NUM_THREADS=8` before launching Jupyter notebook
2. `ThreadsX` (uses the same number of threads as `Base.Threads`)
3. `Distributed` **(doesn't seem to be working?)**

In [8]:
using Base.Threads

@show nthreads()

@btime @threads for ray in projector
    siddon(ray, ct)
end

nthreads() = 8
  1.758 s (306933458 allocations: 6.55 GiB)


In [9]:
using ThreadsX

@btime ThreadsX.collect(siddon(ray, ct) for ray in projector);

  1.760 s (307207216 allocations: 6.56 GiB)


┌ Info: Precompiling ThreadsX [ac1d9e8a-700a-412c-b207-f0111f4b6c0d]
└ @ Base loading.jl:1423


In [None]:
# Doesn't seem to work?
using Distributed

@btime @distributed for ray in projector
    siddon(ray, ct)
end