# NetCDFSC path handling

*Aka data reference syntax handling*

For CMIP experiments, the paths in which data is saved is defined by a data reference syntax. These syntax can be hard to remember. However, from a given subclass of `SCMCube`, the expected path can be easily queried in two ways:

1. Look at the default output of the cubes `get_filepath_from_load_data_from_identifiers_args` method
    - To see how time period should be included, look at the output of the same method with the relevant 'time period/range' argument (this argument is `None` by default).
1. Look at the cube's docstring

Note: 

- the root directory is by default `.` i.e. the current working directory.
- the file extension does not include a `.`, you must specify this yourself!

The meaning of these arguments is somewhat explained by the docstring of `get_filepath_from_load_data_from_identifiers_args` but [pull request](https://github.com/znicholls/netcdf-scm/pulls) are always welcome to make these better!

In the following cells, we give a few examples for some of the available cubes.

The full list of available cubes is given by

In [1]:
from netcdf_scm import iris_cube_wrappers
print([
    el for el in dir(iris_cube_wrappers)
    if el.endswith("Cube") and not el.startswith("_")
])

['CMIP6Input4MIPsCube', 'CMIP6OutputCube', 'MarbleCMIP5Cube', 'SCMCube']


## CMIP6Input4MIPsCube

In [2]:
from netcdf_scm.iris_cube_wrappers import CMIP6Input4MIPsCube
cube_instance = CMIP6Input4MIPsCube()
cube_instance.get_filepath_from_load_data_from_identifiers_args()

'./activity-id/mip-era/target-mip/institution-id/source-id-including-institution-id/realm/frequency/variable-id/grid-label/version/variable-id_activity-id_dataset-category_target-mip_source-id-including-institution-id_grid-labelfile-ext'

Including a `.` in the file extension makes this looks more as expected.

In [3]:
cube_instance.get_filepath_from_load_data_from_identifiers_args(
    file_ext=".nc"
)

'./activity-id/mip-era/target-mip/institution-id/source-id-including-institution-id/realm/frequency/variable-id/grid-label/version/variable-id_activity-id_dataset-category_target-mip_source-id-including-institution-id_grid-label.nc'

Including the time range too completes the picture.

In [4]:
cube_instance.get_filepath_from_load_data_from_identifiers_args(
    time_range="YYYY-YYYY", 
    file_ext=".nc"
)

'./activity-id/mip-era/target-mip/institution-id/source-id-including-institution-id/realm/frequency/variable-id/grid-label/version/variable-id_activity-id_dataset-category_target-mip_source-id-including-institution-id_grid-label_YYYY-YYYY.nc'

## MarbleCMIP5Cube

In [5]:
from netcdf_scm.iris_cube_wrappers import MarbleCMIP5Cube
cube_instance = MarbleCMIP5Cube()
cube_instance.get_filepath_from_load_data_from_identifiers_args()

'./activity/experiment/modeling-realm/variable-name/model/ensemble-member/variable-name_modeling-realm_model_experiment_ensemble-member.nc'

## CMIP6OutputCube

In [6]:
from netcdf_scm.iris_cube_wrappers import CMIP6OutputCube
cube_instance = CMIP6OutputCube()
cube_instance.get_filepath_from_load_data_from_identifiers_args()

'./mip-era/activity-id/institution-id/source-id/experiment-id/member-id/variable-id/grid-label/version/variable-id_table-id_source-id_experiment-id_member-id_grid-labelfile-ext'

Printing the docstring provides extra information, including the link to the data reference syntax page (between the carets i.e. between the `<` and `>`).

In [7]:
# in a notebook, putting a question mark after an object/function/method
# is a nice shortcut to see the docstring
cube_instance.get_filepath_from_load_data_from_identifiers_args?
print(cube_instance.get_filepath_from_load_data_from_identifiers_args.__doc__)

Get the full filepath of the data to load from the arguments passed to ``self.load_data_from_identifiers``.

        Full details about the meaning of each identifier is given in Table 1 of the
        `CMIP6 Data Reference Syntax <https://goo.gl/v1drZl>`_.

        Parameters
        ----------
        root_dir : str, optional
            The root directory of the database i.e. where the cube should start its
            path from e.g. ``/home/users/usertim/cmip6_data``.

        mip_era : str, optional
            The mip_era for which we want to load data e.g. ``CMIP6``.

        activity_id : str, optional
            The activity for which we want to load data e.g. ``DCPP``.

        institution_id : str, optional
            The institution for which we want to load data e.g. ``CNRM-CERFACS``

        source_id : str, optional
            The source_id for which we want to load data e.g. ``CNRM-CM6-1``. This was
            known as model in CMIP5.

        experiment_id : str, o

In [8]:
CMIP6OutputCube?
print(CMIP6OutputCube.__doc__)

Subclass of ``SCMCube`` which can be used with CMIP6 model output data

    The data must match the CMIP6 data reference syntax as specified in the 'File name
    template' and 'Directory structure template' sections of the
    `CMIP6 Data Reference Syntax <https://goo.gl/v1drZl>`_.
    
