## SDP parametric model - `pandas` system sizing

The SDP parametric model produces a human-readable `.csv` file when producing the system sizing reports. This is useful for eyeballing, but is challenging to parse computationally. The parametric model comes with its bespoke `lookup` code to wade through the generate reports, but it is opaque and requires a thorough understanding of the code. We modify the system sizing code in the `notebooks/SKA1_Export.ipynb` and use `pandas` for transparency and readability to generate smaller '.csv' files that are targetted to the compute and data requirements necessary to generate our configuration.

In [10]:
from pandas_system_sizing import *

SDP_PAR_DIR = '../sdp-par-model'

# Create dictionary of dataframes, one per telescope

df_dict = translate_sdp_hpso_reports_to_dataframe(
    f'{SDP_PAR_DIR}/data/csv/2019-06-20-2998d59_hpsos.csv' # Latest csv
)

df_dict['SKA1_Low'].head()

Unnamed: 0,HPSO,Stations,Total Time [s],Tobs [h],Ingest [Pflop/s],RCAL [Pflop/s],FastImg [Pflop/s],ICAL [Pflop/s],DPrepA [Pflop/s],DPrepB [Pflop/s],DPrepC [Pflop/s],DPrepD [Pflop/s],Total RT [Pflop/s],Total Batch [Pflop/s],Total [Pflop/s],Ingest Rate [TB/s],Total [Pflop/s,PSS [Pflop/s],PST [Pflop/s]
0,hpso01,512.0,18000000.0,5.0,0.632428,0.748045,0.377128,6.87878,2.354166,2.503984,5.119784,0.300742,1.757601,17.15746,,0.459025,18.915056,,
1,hpso02a,512.0,18000000.0,5.0,0.632428,0.748045,0.377128,4.014338,2.354166,2.503984,5.119784,0.300742,1.757601,14.29301,,0.459025,16.050615,,
2,hpso02b,512.0,18000000.0,5.0,0.632428,0.748045,0.377128,4.014338,2.354166,2.503984,5.119784,0.300742,1.757601,14.29301,,0.459025,16.050615,,
3,hpso04a,512.0,45900000.0,0.666667,0.632367,0.216732,0.115188,,,,,,0.964287,0.00050088,,0.459025,0.964788,0.000501,
4,hpso05a,512.0,15480000.0,0.666667,0.632367,0.216732,0.115188,,,,,,0.964287,2.755e-07,,0.459025,0.964287,,2.755e-07


## How to generate an observation schedule

Observation schedules require the following data. The nature of observations require pipelines to be generated at runtime, so there is a 'cyclical' reference between the two. Ultimately, the observation configuration file must point to the appropriate workflow file, which is where the reference comes into place. 

### Telescope description 
The Telescope is defined through the following: 

    * Number of stations
    * Number of channels 
    * Maximum ingest

These are necessary to generate an accurate observation schedule; the number of stations lets us determine how many observations we are able to schedule on the telescope at any given time. 

### Observations:
    
    * Individual observation has:
        
        * Count (number of that observation in schedule) 
        * HPSO 
        * Pipeline (what pipeline are we running). 

### Observation pipelines:
    
    * Requires information from the observation; 
        
        * HPSO and Pipeline for FLOP/s and PB/s 

The steps are as follows. 

## Generate Observations using SDP parametric model data

In [3]:
from hpso_to_observation import *

# Assuming system_sizing has been run - this is 
SYSTEM_SIZING = "csv/SKA1_Low_hpso_pandas.csv"
system_sizing = convert_systemsizing_csv_to_dict(SYSTEM_SIZING)
df_pipeline_products