# Using GeMMM

In this notebook we demonstrate how GeMMM can be used to sample origin-destination matrices for a specific set of MSOAs.

A table of available MSOAs is provided in the `tables` module. This includes additional information such as the Local Authority District (LAD), region and country that may be relevant when journey numbers are required for a larger area. For MSOAs in Wales and Scotland, the region and LAD columns are equivalent.

In [1]:
from gemmm.tables import gb_msoas
gb_msoas.head()

ModuleNotFoundError: No module named 'gemmm'

<br>
Suppose that we are interested in generating journey numbers for MSOAs located within the Cambridge LAD.

We can pull out the codes of MSOAs that satisfy this condition from the previous table.

In [2]:
LAD_NAME = 'Cambridge'
msoas = gb_msoas.query('lad_name==@LAD_NAME').msoa.to_numpy()

NameError: name 'gb_msoas' is not defined

To sample journey numbers, we first provide the `OriginDestination` class with this array of MSOAs and a day type, either weekday or weekend. 

When running this for the first time, GeMMM will download [data files](https://github.com/ukhsa-collaboration/Gemmm/tree/main/model_data/) required by the Fourier series and radiation models for each day type. These are cached for future use.

In [3]:
from gemmm import OriginDestination
sampler = OriginDestination(msoas=msoas, day_type='weekday')

ModuleNotFoundError: No module named 'gemmm'

With this, we can now sample journey numbers for a single hour of the day or list of hours. Here, `n_realizations` specifies the number of realizations that will be generated for each hour. 

In [4]:
sampler.generate_sample(hours=[8, 12, 16], n_realizations=5)

NameError: name 'sampler' is not defined

Samples are stored as a dictionary in `X.samples` with keys given by the tuple *(x, y)*, where *x* is the hour and *y* is the realization number. Note that the numbering of realizations begins with zero.

Since the origin-destination matrices contain a high proportion of zeros when considering MSOAs over a large area, the generated matrices are stored in sparse matrix format. Specifically, the COOrdinate format is used, where the row attribute contains the indices of the start MSOAs, the col attribute contains the indices of the end MSOA, and the data attribute contains the number of journeys. The indices of the MSOAs refer to their position in the list originally provided to the `OriginDestination` class.

To present the output in a more readable format, the matrices can be converted to pandas DataFrames.

In [5]:
sampler.to_pandas(hour=8, realization=0, wide=True)

NameError: name 'sampler' is not defined

<br>
Sometimes it may be useful to know average journey numbers rather than sampled values. These averages can be extracted from the Fourier series and radiation model data for a specific hour.

In [6]:
import numpy as np

hour = 16
fourier_mean = np.zeros((len(sampler.msoas),)*2)
fourier_mean[sampler.fourier_data.row, sampler.fourier_data.col] = sampler.fourier_data.mean[hour]

radiation_mean = sampler.radiation_data.theta[hour] * sampler.radiation_data.mean

model_mean = fourier_mean + radiation_mean

NameError: name 'sampler' is not defined

## Saving and loading samples
In some instances, it is convenient to save the generated matrices for use later. In GeMMM, this can be achieved using the `save_sample` argument when generating the samples. If True, the samples will be saved in the current directory, otherwise the path to a specific directory must be provided. The filename is automatically set using a timestamp and the day type.

In [7]:
save_file = sampler.generate_sample(hours=[8, 12, 16], n_realizations=5, save_sample=True)

NameError: name 'sampler' is not defined

To load the samples, we again use the `OriginDestination` class, but this time provide the full file path.

In [8]:
loader = OriginDestination(file=save_file)

NameError: name 'OriginDestination' is not defined

We can then load a matrix for one of the available hours. If a realization is not specified, one will be selected at random.

In [9]:
loader.load_sample(hour=16, realization=0, wide=True)

NameError: name 'loader' is not defined