# Multiprocessing with TARDIS

There are a number of cases in which a user might want to run TARDIS multiple times — e.g., to investigate the effects of Poisson noise on a spectrum, or to explore a grid in parameter space. Performing these runs many times is often computationally intensive, resulting in a significant science delay. 

Parallel computing offers a way to reduce the timescale of many-run simulations. Essentially, it allows multiple CPUs (central processing units) on a given system to independently perform computations at the same time, with the potential to drastically decrease total runtime. 

In ``Python``, parallel computing can be performed with the ``multiprocessing`` module. This approach naturally relies on the assumption that each computation is independent of the others.


## Further resources on multiprocessing
The above is a (very!) brief primer on this subject to motivate the subsequent code; for more detailed information, please see the below resources:
- [Python docs on the multiprocessing module](https://docs.python.org/3.4/library/multiprocessing.html?highlight=process)
- [A guide to the multiprocessing module](https://pymotw.com/2/multiprocessing/)
- [Medium article on multiprocessing](https://medium.com/@urban_institute/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba)

## Example: multiprocessing with differently seeded models

A simpler example of multiprocessing involves only changing one parameter from model to model: in this case, its random number seed. In short, a random number seed initializes a random number generator — which, as it turns out, is never *truly* random, and will in fact use the seed as a jumping-off point for subsequent pseudo-randomness. So, each time you initialize a random number generator with the same seed, it will yield the exact same sequence of "random" numbers, while random number generators with different seeds will yield essentially independent sequences. As a Monte Carlo radiative transfer code, TARDIS incorporates a number of random processes, so the choice of seed will affect the final spectral output to a certain extent.


First, we set up our TARDIS run as per the [TARDIS quickstart](https://tardis-sn.github.io/tardis/quickstart/quickstart.html), importing functions and downloading files as needed. We also import the `Configuration` method so that we can make alterations to our model on a per-run basis. Finally, we import a function from the `multiprocessing` module.

In [1]:
import numpy as np

from tardis import run_tardis
from tardis.io.atom_data.util import download_atom_data
from tardis.io.config_reader import Configuration
from tardis.simulation import Simulation
from multiprocessing import Pool, cpu_count

  return f(*args, **kwds)
  from tqdm.autonotebook import tqdm


In [2]:
download_atom_data('kurucz_cd23_chianti_H_He')
!curl -O https://raw.githubusercontent.com/tardis-sn/tardis/master/docs/models/examples/tardis_example.yml


[[1mtardis.io.atom_data.atom_web_download[0m][[1;37mINFO[0m   ]  Downloading atomic data from https://media.githubusercontent.com/media/tardis-sn/tardis-refdata/master/atom_data/kurucz_cd23_chianti_H_He.h5 to /home/asavel/Downloads/tardis-data/kurucz_cd23_chianti_H_He.h5 ([1matom_web_download.py[0m:47)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   980  100   980    0     0   5000      0 --:--:-- --:--:-- --:--:--  4974


Next, we define a function that will run a TARDIS model for a single seed.

In [3]:
def do_run_simple_seeding(seed):
    seed = int(seed) # a seed needs to be an integer!
    
    config = Configuration.from_yaml('tardis_example.yml')
    config.montecarlo.seed = seed  # this actually sets the seed
    
    sim = Simulation.from_config(config)
    sim.run()
    
    # next, we save the data that we care about
    spectrum = sim.runner.spectrum
    flux = spectrum.luminosity_density_lambda
    
    spectrum_virtual = sim.runner.spectrum_virtual
    virtual_flux = spectrum_virtual.luminosity_density_lambda
    
    np.savetxt(f'real_spectrum_{seed}.txt', flux)  # save the real spectrum
    np.savetxt(f'virtual_spectrum_{seed}.txt', virtual_flux)  # save the virtual spectrum
    if seed == 0:
        np.savetxt('wavelength.txt', spectrum.wavelength) 
        # each run will cover the same wavelength range, so we only need
        # to save the wavelength array once.

Next, we set up a list of jobs to apply this function to. In our case, we can just generate an array from 0 to 99, each element of which will seed a random number generator for a different simulation run.

In [4]:
job_list = np.arange(10)

We subsequently need to determine how many processes should be run. A general rule of thumb is that you shouldn't have more processes than you have CPUs — otherwise, the point of multiprocessing is defeated, as we'd hoped to have each process sent to a different CPU. We can quickly check how many CPUs we have and set our number of processes as conservatively below that number:

In [5]:
num_CPUs = print(cpu_count())
num_processes = 8 # a very conservative number for now!

192


Finally, we initialize a `Pool` object with the number of desired processes and use this this object to map our function onto our list of jobs. Voilà!

In [6]:
p = Pool(num_processes)
p.map(do_run_simple_seeding, job_list)

[[1mtardis.plasma.standard_plasmas[0m][[1;37mINFO[0m   ]  Reading Atomic Data from kurucz_cd23_chianti_H_He.h5 ([1mstandard_plasmas.py[0m:74)
[[1mtardis.plasma.standard_plasmas[0m][[1;37mINFO[0m   ]  Reading Atomic Data from kurucz_cd23_chianti_H_He.h5 ([1mstandard_plasmas.py[0m:74)
[[1mtardis.plasma.standard_plasmas[0m][[1;37mINFO[0m   ]  Reading Atomic Data from kurucz_cd23_chianti_H_He.h5 ([1mstandard_plasmas.py[0m:74)
[[1mtardis.plasma.standard_plasmas[0m][[1;37mINFO[0m   ]  Reading Atomic Data from kurucz_cd23_chianti_H_He.h5 ([1mstandard_plasmas.py[0m:74)
[[1mtardis.plasma.standard_plasmas[0m][[1;37mINFO[0m   ]  Reading Atomic Data from kurucz_cd23_chianti_H_He.h5 ([1mstandard_plasmas.py[0m:74)
[[1mtardis.io.atom_data.util[0m][[1;37mINFO[0m   ]  Atom Data kurucz_cd23_chianti_H_He.h5 not found in local path. Exists in TARDIS Data repo /home/asavel/Downloads/tardis-data/kurucz_cd23_chianti_H_He.h5 ([1mutil.py[0m:29)
[[1mtardis.io.atom_data.util

[None, None, None, None, None, None, None, None, None, None]

The exact computation time spent will vary from system to system, and the exact output will depend on what version of TARDIS you're using; the above output is from an ongoing TARDIS development branch (`numba_montecarlo`).