## Gemmm

A table of the Middle Super Output Areas (MSOAs) used to fit the telecoms model is provided in the `tables` module.

This table includes columns for the code and name of each MSOA, as well as their corresponding Local Authority District (LAD), region and country.

For MSOAs in Wales and Scotland, the region column is equal to the LAD column since there is no equivalent to regions.

In [10]:
from gemmm.tables import gb_msoas

In [9]:
gb_msoas.head(5)

Unnamed: 0,msoa,msoa_name,lad,lad_name,region,region_name,country
5808,E02000001,City of London 001,E09000001,City of London,E12000007,London,England
5809,E02000002,Barking and Dagenham 001,E09000002,Barking and Dagenham,E12000007,London,England
5810,E02000003,Barking and Dagenham 002,E09000002,Barking and Dagenham,E12000007,London,England
5811,E02000004,Barking and Dagenham 003,E09000002,Barking and Dagenham,E12000007,London,England
5812,E02000005,Barking and Dagenham 004,E09000002,Barking and Dagenham,E12000007,London,England


We can use this table to extract a list of MSOA codes for a certain area.

In [36]:
LAD_NAME = 'Cambridge'
msoas = gb_msoas.query('lad_name==@LAD_NAME').msoa.values

The `OriginDestination` class can then be used to sample the numbers of journeys between these MSOAs at different hours of the day.

To do so, we provide it with the list of MSOAs and a day type, either weekday or weekend.

The model requires two data files to generate the samples. These files are downloaded from [Gemmm/model_data](https://github.com/ukhsa-collaboration/Gemmm/tree/main/model_data) and cached for future use. 

In [32]:
from gemmm import OriginDestination

In [37]:
X = OriginDestination(msoas=msoas, day_type='weekday')

Downloading file 'radiation_data_weekday.hdf5' from 'https://api.github.com/repos/ukhsa-collaboration/Gemmm/contents/model_data/radiation_data_weekday.hdf5' to 'C:\Users\Jonathan.Carruthers\AppData\Local\gemmm-model-data\gemmm-model-data\Cache'.
100%|#####################################| 95.4M/95.4M [00:00<00:00, 95.3GB/s]


We now need to specify the hours for which we require samples (0-23), as well as the number of realizations for each hour.

In [42]:
samples = X.generate_sample(hours=[8, 12, 16], n_realizations = 5)

We can also write the samples to a NetCDF4 file using the `save_sample` argument. If True, the file is saved in the current working directory, otherwise we can specify a directory.

In [43]:
samples = X.generate_sample(hours=[8, 12, 16], n_realizations = 5, save_sample=True)

Saving samples to weekday_samples_2024-09-06--09-44-16.nc


To load the samples, we again use the `OriginDestination` class, but this time provide the file path and name.

In [64]:
Y = OriginDestination(file='weekday_samples_2024-09-06--09-44-16.nc')

Available hours: 8, 12, 16
Number of realizations: 5


We can load a specific realization for one of the available hours. If the `realization` argument is omitted, a realization will be chosen at random.

By default, this will return a pandas DataFrame containing the start MSOA, end MSOA, and number of journeys between them. Pairs with zero journeys are not included.

In [68]:
loaded_sample = Y.load_sample(hour=8, realization=0)
loaded_sample

Unnamed: 0,start_msoa,end_msoa,journeys
0,E02003719,E02003719,247
1,E02003719,E02003720,4
2,E02003719,E02003721,3
3,E02003719,E02003722,1
4,E02003719,E02003723,1
...,...,...,...
163,E02003731,E02003727,2
164,E02003731,E02003728,6
165,E02003731,E02003729,4
166,E02003731,E02003730,13


Setting `as_pandas=False` will return a numpy array that instead contains the indices of the start MSOA and end MSOA, rather than their codes.

Finally, converting the DataFrame into wide format provides output that better resembles an origin-destination matrix:

In [69]:
pd.pivot_table(loaded_sample, columns='end_msoa', index='start_msoa', fill_value=0)

Unnamed: 0_level_0,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys,journeys
end_msoa,E02003719,E02003720,E02003721,E02003722,E02003723,E02003724,E02003725,E02003726,E02003727,E02003728,E02003729,E02003730,E02003731
start_msoa,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
E02003719,247.0,4.0,3.0,1.0,1.0,16.0,31.0,12.0,3.0,24.0,5.0,41.0,19.0
E02003720,5.0,286.0,10.0,7.0,8.0,9.0,45.0,12.0,5.0,26.0,11.0,48.0,33.0
E02003721,5.0,4.0,375.0,4.0,6.0,5.0,40.0,1.0,3.0,8.0,9.0,41.0,20.0
E02003722,4.0,4.0,4.0,394.0,0.0,4.0,14.0,1.0,4.0,10.0,9.0,28.0,14.0
E02003723,2.0,2.0,10.0,1.0,410.0,4.0,27.0,9.0,1.0,15.0,2.0,22.0,21.0
E02003724,7.0,16.0,6.0,4.0,9.0,475.0,64.0,7.0,5.0,17.0,14.0,32.0,34.0
E02003725,16.0,14.0,22.0,4.0,10.0,23.0,1042.0,7.0,8.0,29.0,20.0,45.0,38.0
E02003726,1.0,3.0,10.0,2.0,2.0,2.0,22.0,329.0,1.0,4.0,7.0,10.0,11.0
E02003727,1.0,4.0,4.0,2.0,12.0,5.0,16.0,1.0,383.0,4.0,6.0,13.0,9.0
E02003728,17.0,7.0,14.0,7.0,16.0,13.0,50.0,7.0,3.0,390.0,10.0,22.0,21.0
