# CMIP6 Deltas

This notebook is for downloading monthly CMIP6 data and computing deltas. It is for an ad-hoc project to generate some useful CMIP6 summaries for Alaska while we await the infrastructure upgrades for accessing and working with larger amounts of data. The data downloaded here may be used in other efforts as well, such as statistical downscaling.

# Download monthly CMIP6 data

### Copernicus API

This notebook will use the [Copernicus API](https://cds.climate.copernicus.eu/api-how-to#install-the-cds-api-key) to access data.

The `cdsapi` python package will need to be installed, it is available via `pip` and on conda-forge: `conda install -c conda-forge cdsapi`. 

Credentials need to be placed in `$HOME/.cdsapirc` per the instructions at the link above.

In [101]:
# Setup - define all imports and paths

import os
from pathlib import Path
import cdsapi


base_dir = Path(os.getenv("BASE_DIR"))
out_dir = Path(os.getenv("OUTPUT_DIR"))
scratch_dir = Path(os.getenv("SCRATCH_DIR"))

anc_dir = base_dir.joinpath("ancillary")
anc_dir.mkdir(exist_ok=True)

# tracking CSV filepath
tracker_fp = anc_dir.joinpath("download_tracker.csv")

### Targets

This section describes the various subsets of CMIP6 data we are interested in. There are three main groups:
1. meterological data at monthly resolution (temp, precip, SLP)
2. model data, for lack of a better term - data about the models
3. water balance data at monthly resolution

#### Meteorological data
**Temporal resolution**: Monthly  
**Experiment (4)**: Historical, SSP1-2.6, SSP2-4.5, SSP5-8.5  
**Variable (3)**: Near-surface air temperature, Precipitation, Sea level pressure  
**Model (12)**: ACCESS-CM2, CESM2, CNRM-CM6-1-HR, EC-Earth3-Veg-LR, GFDL-ESM4, HadGEM3-GC31-LL, HadGEM3-GC31-MM, KACE-1-0-G, MIROC6, MPI-ESM1-2-LR, MRI-ESM2-0, NorESM2-MM  


#### Land surface data
**Temporal resolution**: Fixed (no temporal resolution)  
**Experiment (1)**: Historical  
**Variable (3)**: Surface altitude, Percentage of the grid cell occupied by land including lakes, Sea area percentage  
**Model (?)**: Subset of above, available  

#### Water balance data
**Temporal resolution**: Monthly  
**Experiment (3)**: Historical, SSP2-4.5, SSP5-8.5  
**Variable (5)**: Evaporation including sublimation and transpiration, Moisture in upper portion of soil column, snowfall flux, surface snow amount, total runoff  
**Model (6)**: CESM2, EC-Earth3-Veg-LR, GFDL-ESM4, MIROC6, MRI-ESM2-0, NorESM2-MM  

#### Fixed parameters
For all of these groups, the following parameters will be constant:  
**Level (1)**: single levels   
**Geographical Area**: Sub-region extraction, North: 74, South 50, West 174, East -120   
**Temporal subset**: Whole available temporal range  


### Notes for downloading

Here are some notes for downloading based on preliminary explorations:  

* the CDS API does not support multiple values for a given field and so requests should be broken up by model, scenario (experiment), and variable
* the longitude coordiantes for "sea area percentage" seem to be on a strictly positive axis (0, 360) and the positive bbox of the above (74, 174, 50, 240) causes errors, so we might just need to grab the whole grid for these ones.

See the Appendix section for some examples of the above issues.

### Define targets in a tracking table

Make a table for tracking the progress of downloads with the following columns:

* `data_group`: which of the above groups the download is for, either `"met_vars"`, `"land_surface"`, or `"water_balance"`
* `model`
* `scenario`
* `variable`
* `t_res`: temporal resolution, either `"monthly"` or `"fixed"`
* `bbox`: spatial extent, in form (N, W, S, E), should be either `(74, 174, 50, -120)` or `None` (for sea area percentage only)
* `result`: download result, either `"pass"`, `"fail"`, or `None`
* `fail_reason`: fail message where applicable, or `None`
* `zip_path`: path to where downloaded `.zip` data in `$SCRATCH_DIR` should be written
* `base_path`: Path to where data should be unzipped and placed in `$BASE_DIR`

In [129]:
# reset tracker - set to True to reset tracking spreadsheet
reset_tracker = True

import pandas as pd

# make the tracking dataframe
column_names = [
    "data_group",
    "model",
    "scenario",
    "variable",
    "t_res",
    "bbox",
    "result",
    "fail_reason",
    "zip_path",
    "base_path",
]
df = pd.DataFrame(columns=column_names)

if reset_tracker:
    df.to_csv(tracker_fp, index=False)

Start specifying the options for populating the table.

##### Models:

In [105]:
# initialize a lookup dict for CDS API
api_lu = {}

# full names of models (for reference at this point)
models = [
    "ACCESS-CM2",
    "CESM2",
    "CNRM-CM6-1-HR",
    "EC-Earth3-Veg-LR",
    "GFDL-ESM4",
    "HadGEM3-GC31-LL",
    "HadGEM3-GC31-MM",
    "KACE-1-0-G",
    "MIROC6",
    "MPI-ESM1-2-LR",
    "MRI-ESM2-0",
    "NorESM2-MM",
]
# api names for models are simply lower case and underscores instead of hyphens
api_lu["models"] = {model: model.lower().replace("-", "_") for model in models}

# subset of models for the water balanace data
wb_models = [
    "CESM2",
    "EC-Earth3-Veg-LR",
    "GFDL-ESM4",
    "MIROC6",
    "MRI-ESM2-0",
    "NorESM2-MM",
]

##### Scenarios (experiments):

In [106]:
scenarios = [
    "historical",
    "SSP1-2.6",
    "SSP2-4.5",
    "SSP5-8.5",
]

# water balance scenarios
wb_scenarios = [
    "historical",
    "SSP2-4.5",
    "SSP5-8.5",
]

# lookups for API
api_lu["scenarios"] = {
    "historical": "historical",
    "SSP1-2.6": "ssp1_2_6",
    "SSP2-4.5": "ssp2_4_5",
    "SSP5-8.5": "ssp5_8_5",
}

##### Variables:

In [107]:
met_varnames = ["tas", "pr", "slp"]
land_varnames = ["orog", "sftlf", "sftof", "evspsbl", "mrsos"]
wb_varnames = ["prsn", "snw", "mrro"]

# variable names
api_lu["varnames"] = {
    "tas": "near_surface_air_temperature",
    "pr": "precipitation",
    "slp": "sea_level_pressure",
    "orog": "surface_altitude",
    "sftlf": "percentage_of_the_grid_cell_occupied_by_land_including_lakes",
    "sftof": "sea_area_percentage",
    "evspsbl": "evaporation_including_sublimation_and_transpiration",
    "mrsos": "moisture_in_upper_portion_of_soil_column",
    "prsn": "snowfall_flux",
    "snw": "surface_snow_amount",
    "mrro": "total_runoff",
}

##### Constants:

In [108]:
# these will stay constant over all model combinations
level = "single_levels"
bbox = [74, 174, 50, -120] #N W S E
api_format = "zip"

Start populating the table:

##### Met vars:

In [130]:
# load the df
df = pd.read_csv(tracker_fp)

monthly_zip_fn = "{}_monthly_mean_{}_{}.zip"
for model in models:
    for scenario in scenarios:
        for varname in met_varnames:
            row_df = {
                "data_group": "met_vars",
                "model": api_lu["models"][model],
                "scenario": api_lu["scenarios"][scenario],
                "variable": api_lu["varnames"][varname],
                "t_res": "monthly",
                "bbox": bbox,
                "result": None,
                "fail_reason": None
            }
            # make zip download path
            row_df["zip_path"] = scratch_dir.joinpath(
                monthly_zip_fn.format(varname, model, scenario)
            )
            row_df["base_path"] = None
            df = df.append(row_df, ignore_index=True)

# write the df           
df.to_csv(tracker_fp, index=False)      

##### Land surface vars

In [131]:
# load the df
df = pd.read_csv(tracker_fp)

land_zip_fn = "{}_fx_historical.zip"
for model in models:
    for varname in land_varnames:
        row_df = {
            "data_group": "land_surface",
            "model": api_lu["models"][model],
            "scenario": "historical",
            "variable": api_lu["varnames"][varname],
            "t_res": "fixed",
            "bbox": bbox,
            "result": None,
            "fail_reason": None
        }
        # set the bbox to None if varname is sftof (sea area percentage)
        if varname == "sftof":
            row_df["bbox"] = None
        # make zip download path
        row_df["zip_path"] = scratch_dir.joinpath(
            land_zip_fn.format(varname)
        )
        row_df["base_path"] = None
        df = df.append(row_df, ignore_index=True)

# write the df           
df.to_csv(tracker_fp, index=False)      

##### Water balance vars

In [132]:
# load the df
df = pd.read_csv(tracker_fp)

monthly_zip_fn = "{}_monthly_mean_{}_{}.zip"
for model in wb_models:
    for scenario in wb_scenarios:
        for varname in wb_varnames:
            row_df = {
                "data_group": "water_balance",
                "model": api_lu["models"][model],
                "scenario": api_lu["scenarios"][scenario],
                "variable": api_lu["varnames"][varname],
                "t_res": "monthly",
                "bbox": bbox,
                "result": None,
                "fail_reason": None
            }
            # make zip download path
            row_df["zip_path"] = scratch_dir.joinpath(
                monthly_zip_fn.format(varname, model, scenario)
            )
            row_df["base_path"] = None
            df = df.append(row_df, ignore_index=True)

# write the df           
df.to_csv(tracker_fp, index=False)      

In [None]:
# define a idempotent function for updating the tracking 
# dataframe based on presence of files in $SCRATCH_DIR and $BASE_DIR

def update_tracker(base_dir, scratch_dir):
    """Update the tracking table based on presence of files
    
    Args:
        base_dir (pathlib.PosixPath): path to base directory that should have an
            ancillary/ folder with the tracking spreadsheet in it
        scratch_dir(pathlib.PosixPath): path to the scratch directory, where API requests
            are downloaded to
            
    Returns:
        Nothing, prints the number of records updated
    """

## Appendix

This section is just for showing some things to look out for when downloading data.

### Longitude restrictions for "sea area percentage"

the longitude coordiantes for "sea area percentage" seem to be on a strictly positive axis (0, 360) and the positive bbox of the above (74, 174, 50, 240) causes errors, so we might just need to grab the whole grid for these ones.

##### bbox with negative value:

In [76]:
fp = scratch_dir.joinpath("test.zip")

c = cdsapi.Client()

try:
    c.retrieve(
        'projections-cmip6',
        {
            'format': api_format,
            'temporal_resolution': "fixed",
            'experiment': "historical",
            'level': 'single_levels',
            'variable': "sea_area_percentage",
            'model': "noresm2_lm",
            "area": [74, 174, 50, -120]
        },
        fp
    )
except Exception as exc:
    print(exc.args)

2022-02-14 13:29:11,313 INFO Welcome to the CDS
2022-02-14 13:29:11,314 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/projections-cmip6
2022-02-14 13:29:11,501 INFO Request is queued
2022-02-14 13:29:12,681 INFO Request is running
2022-02-14 13:29:16,795 INFO Request is failed
2022-02-14 13:29:16,796 ERROR Message: an internal error occurred processing your request
2022-02-14 13:29:16,796 ERROR Reason:  Process error: The requested longitude subset -120.0, 174.0 is not within the longitude bounds of this dataset and the data could not be converted to this longitude frame successfully. Please re-run your request with longitudes within the bounds of the dataset: 0.00, 359.99
2022-02-14 13:29:16,797 ERROR   Traceback (most recent call last):
2022-02-14 13:29:16,798 ERROR     File "/usr/local/lib/python3.6/site-packages/rooki/results.py", line 33, in url
2022-02-14 13:29:16,799 ERROR       return self.response.get()[0]
2022-02-14 13:29:16,800 ERROR     File "/u

('an internal error occurred processing your request. Process error: The requested longitude subset -120.0, 174.0 is not within the longitude bounds of this dataset and the data could not be converted to this longitude frame successfully. Please re-run your request with longitudes within the bounds of the dataset: 0.00, 359.99.',)


##### bbox with corresponding positive value:

In [77]:
fp = scratch_dir.joinpath("test.zip")

c = cdsapi.Client()

try:
    c.retrieve(
        'projections-cmip6',
        {
            'format': api_format,
            'temporal_resolution': "fixed",
            'experiment': "historical",
            'level': 'single_levels',
            'variable': "sea_area_percentage",
            'model': "noresm2_lm",
            "area": [74, 174, 50, 240]
        },
        fp
    )
except Exception as exc:
    print(exc.args)

2022-02-14 13:31:50,746 INFO Welcome to the CDS
2022-02-14 13:31:50,747 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/projections-cmip6
2022-02-14 13:31:50,935 INFO Request is queued
2022-02-14 13:31:52,115 INFO Request is running
2022-02-14 13:31:56,228 INFO Request is failed
2022-02-14 13:31:56,229 ERROR Message: an internal error occurred processing your request
2022-02-14 13:31:56,229 ERROR Reason:  execution failed
2022-02-14 13:31:56,230 ERROR   Traceback (most recent call last):
2022-02-14 13:31:56,231 ERROR     File "/usr/local/lib/python3.6/site-packages/birdy/client/base.py", line 347, in _execute
2022-02-14 13:31:56,232 ERROR       pid, inputs=wps_inputs, output=wps_outputs, mode=mode
2022-02-14 13:31:56,233 ERROR     File "/usr/local/lib/python3.6/site-packages/owslib/wps.py", line 359, in execute
2022-02-14 13:31:56,234 ERROR       response = execution.submitRequest(request)
2022-02-14 13:31:56,234 ERROR     File "/usr/local/lib/python3.6/site-

('an internal error occurred processing your request. execution failed.',)


### No support for multiple field values

The CDS API does not support multiple values for a given field:

In [79]:
# example with two variables supplied
fp = scratch_dir.joinpath("test.zip")

c = cdsapi.Client()

try:
    c.retrieve(
        'projections-cmip6',
        {
            'format': api_format,
            'temporal_resolution': "monthly",
            'experiment': "historical",
            'level': 'single_levels',
            'variable': ["percipitation", "near_surface_air_temperature"],
            'model': "cesm2",
            "area": [74, 174, 50, -120],
            "date": "1850-01-15/1851-12-31",
        },
        fp
    )
except Exception as exc:
    print(exc.args)

2022-02-14 13:36:23,676 INFO Welcome to the CDS
2022-02-14 13:36:23,677 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/projections-cmip6
2022-02-14 13:36:23,866 INFO Request is queued
2022-02-14 13:36:25,049 INFO Request is running
2022-02-14 13:36:29,161 INFO Request is failed
2022-02-14 13:36:29,162 ERROR Message: an internal error occurred processing your request
2022-02-14 13:36:29,163 ERROR Reason:  CMIP6 requests are limited to a single variable per request; please split your request into multiple requests.
2022-02-14 13:36:29,164 ERROR   Traceback (most recent call last):
2022-02-14 13:36:29,165 ERROR     File "/opt/cdstoolbox/cdscompute/cdscompute/cdshandlers/services/handler.py", line 55, in handle_request
2022-02-14 13:36:29,166 ERROR       result = cached(context.method, proc, context, context.args, context.kwargs)
2022-02-14 13:36:29,167 ERROR     File "/opt/cdstoolbox/cdscompute/cdscompute/caching.py", line 108, in cached
2022-02-14 13:36:29,168

('an internal error occurred processing your request. CMIP6 requests are limited to a single variable per request; please split your request into multiple requests..',)
