# CONUS 404 diagnostic plots

### Data Access

- This notebook illustrates how to make diagnostic plots using the CONUS 404 dataset hosted on NCAR's glade storage.
- https://rda.ucar.edu/datasets/d559000/
- This data is open access and can be accessed via 3 protocols
  1) posix (if you have access to NCAR's HPC systems: Casper or Derecho)
  2) HTTPS
  3) OSDF using intake-ESM catalogs.
- Learn about intake-ESM catalogs: https://intake-esm.readthedocs.io/en/stable/ 

In [1]:
import warnings
warnings.filterwarnings("ignore")
import intake
import numpy as np
import pandas as pd
import xarray as xr
import seaborn as sns
import matplotlib.pyplot as plt
import os

In [2]:
# import fsspec.implementations.http as fshttp
# from pelicanfs.core import PelicanFileSystem, PelicanMap, OSDFFileSystem 

In [3]:
import dask 
from dask_jobqueue import PBSCluster
from dask.distributed import Client
from dask.distributed import performance_report

In [4]:
# Catalog URLs
cat_url     = '/glade/campaign/collections/rda/data/d559000/catalogs/d559000_catalog.json' # POSIX access on NCAR
# cat_url     = 'https://data.rda.ucar.edu/d559000/catalogs/d559000_catalog-http.json' # HTTPS access
# cat_url     = 'https://data-osdf.rda.ucar.edu/ncar/rda/d559000/catalogs/d559000_catalog-osdf.json' #OSDF access
print(cat_url)

/glade/campaign/collections/rda/data/d559000/catalogs/d559000_catalog.json


In [5]:
# Get your scratch folder
scratch = os.environ.get("SCRATCH") or getuser()
print(scratch)

/glade/derecho/scratch/harshah


## Create a PBS cluster

In [6]:
# Create a PBS cluster object
cluster = PBSCluster(
    job_name = 'wcrp-hackathon25',
    account= 'UCIS0005',
    cores = 1,
    memory = '10GiB',
    processes = 1,
    local_directory = scratch,
    log_directory = scratch,
    resource_spec = 'select=1:ncpus=1:mem=10GB',
    queue = 'casper',
    walltime = '5:00:00',
    interface = 'ext'
)

client = Client(cluster)

In [None]:
# Scale the cluster and display cluster dashboard URL
n_workers = 5
cluster.scale(n_workers)
client.wait_for_workers(n_workers = n_workers)
cluster

## Load CONUS 404 data from RDA using an intake catalog

In [None]:
col = intake.open_esm_datastore(cat_url)
col

- col.df turns the catalog object into a pandas dataframe!
- (Actually, it accesses the dataframe attribute of the catalog)

In [None]:
col.df

## Select data and plot

#### What if you don't know the variable names ?
- Use pandas logic to print out the short_name and long_name

In [None]:
col.df[['variable','long_name']]

- We notice that long_name is not available for some variables like 'V'
- In such cases, please look at the wrfout_datadictionary file on this page https://rda.ucar.edu/datasets/d559000/documentation/#

### Temperature
- Plot temperature for a random date

In [None]:
cat_temp = col.search(variable='T2')
cat_temp.df.head()

- The data is organized in (virtual) zarr stores with one water year's worth of data in one file
- Select a year. This is done by selcting the start time to be Oct 1 of that year or the end time to be Sep 30 of the same year
- This also means that if you want to request data for other days, say Jan 1 for the year YYYY, you first have to load the data for one year i.e., YYYY and then select the data for that particular day. This example is discussed below.


In [None]:
date = "2020-10-01"
# year = "2021"
cat_temp_subset = cat_temp.search(start_time = date)
cat_temp_subset

### Load data into xarray

In [None]:
# Load catalog entries for subset into a dictionary of xarray datasets, and open the first one.
dsets = cat_temp_subset.to_dataset_dict(zarr_kwargs={"consolidated": True})
print(f"\nDataset dictionary keys:\n {dsets.keys()}")

In [None]:
# Load the first dataset and display a summary.
dataset_key = list(dsets.keys())[0]
# store_name = dataset_key + ".zarr"
print(dsets.keys())
ds = dsets[dataset_key]
ds = ds.T2
ds

In [None]:
%%time
desired_time = "2021-01-01T00"
ds.sel(Time=desired_date,method='nearest').plot(cmap='inferno')

In [None]:
cluster.close()