## Gemmm

A table of the Middle Super Output Areas (MSOAs) used to fit the telecoms model is provided in the `tables` module.

This table includes columns for the code and name of each MSOA, as well as their corresponding Local Authority District (LAD), region and country.

For MSOAs in Wales and Scotland, the region column is equal to the LAD column since there is no equivalent to regions.

In [1]:
from gemmm.tables import gb_msoas

In [2]:
gb_msoas.head(5)

Unnamed: 0,msoa,msoa_name,lad,lad_name,region,region_name,country
0,E02000001,City of London 001,E09000001,City of London,E12000007,London,England
1,E02000002,Barking and Dagenham 001,E09000002,Barking and Dagenham,E12000007,London,England
2,E02000003,Barking and Dagenham 002,E09000002,Barking and Dagenham,E12000007,London,England
3,E02000004,Barking and Dagenham 003,E09000002,Barking and Dagenham,E12000007,London,England
4,E02000005,Barking and Dagenham 004,E09000002,Barking and Dagenham,E12000007,London,England


We can use this table to extract a list of MSOA codes for a certain area.

In [3]:
LAD_NAME = 'Cambridge'
msoas = gb_msoas.query('lad_name==@LAD_NAME').msoa.values

The `OriginDestination` class can then be used to sample the numbers of journeys between these MSOAs at different hours of the day.

To do so, we provide it with the list of MSOAs and a day type, either weekday or weekend.

The model requires two data files to generate the samples. These files are downloaded from [Gemmm/model_data](https://github.com/ukhsa-collaboration/Gemmm/tree/main/model_data) and cached for future use. 

In [4]:
from gemmm import OriginDestination

In [5]:
X = OriginDestination(msoas=msoas, day_type='weekday')

Downloading file 'fourier_data_weekday.hdf5' from 'https://api.github.com/repos/ukhsa-collaboration/Gemmm/contents/model_data/fourier_data_weekday.hdf5' to 'C:\Users\Jonathan.Carruthers\AppData\Local\gemmm-model-data\gemmm-model-data\Cache'.
100%|#############################################| 38.9M/38.9M [00:00<?, ?B/s]
Downloading file 'radiation_data_weekday.hdf5' from 'https://api.github.com/repos/ukhsa-collaboration/Gemmm/contents/model_data/radiation_data_weekday.hdf5' to 'C:\Users\Jonathan.Carruthers\AppData\Local\gemmm-model-data\gemmm-model-data\Cache'.
100%|#############################################| 95.4M/95.4M [00:00<?, ?B/s]


We now need to specify the hours for which we require samples (0-23), as well as the number of realizations for each hour.

In [6]:
samples = X.generate_sample(hours=[8, 12, 16], n_realizations=5)

The samples are returned in a list, where the first n_realization entries correspond to the first hour, the next n_realization correspond to the second hour etc.

Each sample is stored as a sparse matrix in coordinate format. The row attribute contains the indices of the start MSOA, the col attribute contains the indices of the end MSOA, and the data attribute contains the number of journeys. The indices of the MSOAs refer to their position in the list or numpy array initially provided to the OriginDestination class.

We can also write the samples to a NetCDF4 file using the `save_sample` argument. If True, the file is saved in the current working directory, otherwise we can specify a directory.

In [7]:
samples = X.generate_sample(hours=[8, 12, 16], n_realizations = 5, save_sample=True)

Saving samples to weekday_samples_2024-09-06--11-20-59.nc


To load the samples, we again use the `OriginDestination` class, but this time provide the path to the file.

In [8]:
# update this with the file name from the output of the previous cell
Y = OriginDestination(file='weekday_samples_2024-09-06--11-20-59.nc')

Available hours: 8, 12, 16
Number of realizations: 5


We can load a specific realization for one of the available hours. If the `realization` argument is omitted, a realization will be chosen at random.

By default, this will return a pandas DataFrame containing the start MSOA, end MSOA, and the number of journeys between them. Pairs with zero journeys are not included.

In [9]:
loaded_sample = Y.load_sample(hour=8, realization=0)
loaded_sample

Unnamed: 0,start_msoa,end_msoa,journeys
0,E02003719,E02003719,216
1,E02003719,E02003720,4
2,E02003719,E02003721,3
3,E02003719,E02003722,1
4,E02003719,E02003723,4
...,...,...,...
162,E02003731,E02003727,12
163,E02003731,E02003728,10
164,E02003731,E02003729,7
165,E02003731,E02003730,12


Setting `as_pandas=False` will return a numpy array that instead contains the indices of the start MSOA and end MSOA, rather than their codes.

Finally, converting the DataFrame into wide format provides output that better resembles an origin-destination matrix:

In [11]:
import pandas as pd

In [12]:
pd.pivot_table(loaded_sample, columns='end_msoa', index='start_msoa', fill_value=0)

Unnamed: 0_level_0,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys
end_msoa,E02003719,E02003720,E02003721,E02003722,E02003723,E02003724,E02003725,E02003726,E02003727,E02003728,E02003729,E02003730,E02003731
start_msoa,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
E02003719,216.0,4.0,3.0,1.0,4.0,12.0,43.0,13.0,5.0,21.0,13.0,42.0,8.0
E02003720,2.0,276.0,13.0,2.0,9.0,19.0,46.0,12.0,7.0,25.0,9.0,47.0,20.0
E02003721,3.0,5.0,357.0,3.0,6.0,6.0,53.0,2.0,4.0,22.0,8.0,29.0,18.0
E02003722,2.0,6.0,4.0,408.0,1.0,3.0,14.0,1.0,0.0,13.0,10.0,14.0,20.0
E02003723,6.0,2.0,6.0,1.0,441.0,6.0,22.0,5.0,2.0,13.0,6.0,40.0,16.0
E02003724,10.0,8.0,7.0,1.0,7.0,502.0,70.0,3.0,3.0,18.0,12.0,45.0,38.0
E02003725,17.0,5.0,16.0,1.0,13.0,24.0,962.0,17.0,8.0,35.0,21.0,59.0,35.0
E02003726,2.0,4.0,15.0,1.0,1.0,5.0,22.0,402.0,1.0,2.0,10.0,9.0,6.0
E02003727,3.0,1.0,6.0,0.0,11.0,2.0,22.0,1.0,354.0,3.0,5.0,11.0,11.0
E02003728,7.0,5.0,15.0,12.0,19.0,15.0,41.0,6.0,3.0,359.0,10.0,32.0,24.0
