# Benchmarking Experiments

Testing different parallelization schemes.

In [1]:
using BenchmarkTools

using DRRs

In [2]:
# Load the volume and get coordinate spacing as a CT object
ct = read_dicom("../data/cxr")

# Create the camera and detector plane
# We assume the volume is centered at (0,0,0)
# Top corner is (360,360,330)
center = [-10, -10, -10]
camera = Camera(center)

center = [400, 400, 400]
height, width = 101, 101
Δx, Δy = 1, 1
detector = Detector(center, height, width, Δx, Δy);

In [3]:
# Make the rays in the projector
projector = @btime make_xrays(camera, detector);

  619.377 μs (51021 allocations: 2.02 MiB)


## Benchmark a single ray

In [4]:
# Time a single ray trace
@btime siddon(projector[1], ct);

  42.788 μs (5 allocations: 24.83 KiB)


## Benchmark basic `for` loops

In [5]:
# Time iterating over all rays
@btime [siddon(ray, ct) for ray in projector];

  605.596 ms (51008 allocations: 253.93 MiB)


In [6]:
# Time broadcasting
import DRRs.siddon

siddon(ray; ct) = siddon(ray, ct)

@btime siddon.(projector; ct);

  606.567 ms (51014 allocations: 253.93 MiB)


In [7]:
# Time a regular old for loop
@btime for ray in projector
    siddon(ray, ct)
end

  608.206 ms (70897 allocations: 254.31 MiB)


## Benchmark parallelization

Try the following: 
1. `Base.Threads`: set `export JULIA_NUM_THREADS=1024` before launching Jupyter notebook
2. `ThreadsX` (uses the same number of threads as `Base.Threads`)
3. `Distributed` **(doesn't seem to be working?)**

In [8]:
using Base.Threads

@show nthreads()

function foo(projector, ct)
    @threads for ray in projector
        siddon(ray, ct)
    end
end

@btime foo(projector, ct)

nthreads() = 1024
  57.125 ms (56126 allocations: 254.33 MiB)


In [9]:
# using ThreadsX

# @btime ThreadsX.collect(siddon(ray, ct) for ray in projector);

In [10]:
# # Doesn't seem to work?
# using Distributed

# @btime @distributed for ray in projector
#     siddon(ray, ct)
# end