# 4km WRF Simulation of Current Climate over South America by SAAG: diagnostic plots

### Data Access

- This notebook illustrates how to make diagnostic plots using the dataset produced by the South America Affinity Group (SAAG) hosted on NCAR's glade storage.
- https://rda.ucar.edu/datasets/d616000/#
- This data is open access and can be accessed via 3 protocols
    1) posix (if you have access to NCAR's HPC systems like Casper or Derecho)
    2) HTTPS or
    3) OSDF using intake-ESM catalogs.
- Learn about intake-ESM: https://intake-esm.readthedocs.io/en/stable/ 

In [1]:
import warnings
warnings.filterwarnings("ignore")
import intake
import numpy as np
import pandas as pd
import xarray as xr
import seaborn as sns
import matplotlib.pyplot as plt
import os

In [2]:
# import fsspec.implementations.http as fshttp
# from pelicanfs.core import PelicanFileSystem, PelicanMap, OSDFFileSystem 

In [3]:
import dask 
from dask_jobqueue import PBSCluster
from dask.distributed import Client
from dask.distributed import performance_report

In [4]:
cat_url     = '/glade/campaign/collections/rda/data/d616000/catalogs/d616000_catalog.json' #POSIX access on NCAR
# cat_url     = 'https://data.rda.ucar.edu/d616000/catalogs/d616000_catalog-http.json' #HTTPS access
# cat_url     = 'https://data-osdf.rda.ucar.edu/ncar/rda/d616000/catalogs/d616000_catalog-osdf.json' #OSDF access
print(cat_url)

/glade/campaign/collections/rda/data/d616000/catalogs/d616000_catalog.json


In [5]:
# Get your scratch folder
scratch = os.environ.get("SCRATCH") or getuser()
print(scratch)

/glade/derecho/scratch/harshah


## Create a PBS cluster

In [6]:
# Create a PBS cluster object
cluster = PBSCluster(
    job_name = 'wcrp-hackathon25',
    account= 'UCIS0005',
    cores = 1,
    memory = '10GiB',
    processes = 1,
    local_directory = scratch,
    log_directory = scratch,
    resource_spec = 'select=1:ncpus=1:mem=10GB',
    queue = 'casper',
    walltime = '5:00:00',
    interface = 'ext'
)

client = Client(cluster)

In [7]:
# Scale the cluster and display cluster dashboard URL
n_workers = 3
cluster.scale(n_workers)
client.wait_for_workers(n_workers = n_workers)
cluster

0,1
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/harshah/proxy/46111/status,Workers: 3
Total threads: 3,Total memory: 30.00 GiB

0,1
Comm: tcp://128.117.208.95:39985,Workers: 3
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/harshah/proxy/46111/status,Total threads: 3
Started: Just now,Total memory: 30.00 GiB

0,1
Comm: tcp://128.117.208.174:46759,Total threads: 1
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/harshah/proxy/45219/status,Memory: 10.00 GiB
Nanny: tcp://128.117.208.174:42075,
Local directory: /glade/derecho/scratch/harshah/dask-scratch-space/worker-rg808dmf,Local directory: /glade/derecho/scratch/harshah/dask-scratch-space/worker-rg808dmf
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 50.91 MiB,Spilled bytes: 0 B
Read bytes: 3.79 GiB,Write bytes: 89.79 MiB

0,1
Comm: tcp://128.117.208.174:46665,Total threads: 1
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/harshah/proxy/36727/status,Memory: 10.00 GiB
Nanny: tcp://128.117.208.174:36185,
Local directory: /glade/derecho/scratch/harshah/dask-scratch-space/worker-79_3bdwd,Local directory: /glade/derecho/scratch/harshah/dask-scratch-space/worker-79_3bdwd
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 50.91 MiB,Spilled bytes: 0 B
Read bytes: 4.92 GiB,Write bytes: 62.12 MiB

0,1
Comm: tcp://128.117.208.174:44489,Total threads: 1
Dashboard: https://jupyterhub.hpc.ucar.edu/stable/user/harshah/proxy/38861/status,Memory: 10.00 GiB
Nanny: tcp://128.117.208.174:36451,
Local directory: /glade/derecho/scratch/harshah/dask-scratch-space/worker-2znb88o1,Local directory: /glade/derecho/scratch/harshah/dask-scratch-space/worker-2znb88o1
Tasks executing:,Tasks in memory:
Tasks ready:,Tasks in flight:
CPU usage: 0.0%,Last seen: Just now
Memory usage: 50.88 MiB,Spilled bytes: 0 B
Read bytes: 3.42 GiB,Write bytes: 5.72 MiB


## Load SAAG data from RDA using an intake catalog

In [8]:
col = intake.open_esm_datastore(cat_url)
col

Unnamed: 0,unique
path,44
variable,214
format,1
short_name,214
long_name,97
units,26
start_time,22
end_time,44
level,0
level_units,0


- col.df turns the catalog object into a pandas dataframe!
- (Actually, it accesses the dataframe attribute of the catalog)

In [9]:
col.df

Unnamed: 0,path,variable,format,short_name,long_name,units,start_time,end_time,level,level_units,frequency
0,/glade/campaign/collections/rda/data/d616000/k...,ACDEWC,reference,ACDEWC,"Accumulated canopy dew rate, accumulated over ...",mm,1999-12-31,1999-12-31 23:00:00,,,0 days 01:00:00
1,/glade/campaign/collections/rda/data/d616000/k...,ACDRIPR,reference,ACDRIPR,"Accumulated canopy precipitation drip rate, ac...",mm,1999-12-31,1999-12-31 23:00:00,,,0 days 01:00:00
2,/glade/campaign/collections/rda/data/d616000/k...,ACDRIPS,reference,ACDRIPS,"Accumulated canopy snow drip rate, accumulated...",mm,1999-12-31,1999-12-31 23:00:00,,,0 days 01:00:00
3,/glade/campaign/collections/rda/data/d616000/k...,ACECAN,reference,ACECAN,Accumulated net evaporation of canopy water (e...,mm,1999-12-31,1999-12-31 23:00:00,,,0 days 01:00:00
4,/glade/campaign/collections/rda/data/d616000/k...,ACEDIR,reference,ACEDIR,Accumulated net soil evaporation or snowpack s...,mm,1999-12-31,1999-12-31 23:00:00,,,0 days 01:00:00
...,...,...,...,...,...,...,...,...,...,...,...
4901,/glade/campaign/collections/rda/data/d616000/k...,V,reference,V,,m s-1,2020-01-01,2020-12-31 21:00:00,,,0 days 03:00:00
4902,/glade/campaign/collections/rda/data/d616000/k...,W,reference,W,,m s-1,2020-01-01,2020-12-31 21:00:00,,,0 days 03:00:00
4903,/glade/campaign/collections/rda/data/d616000/k...,Z,reference,Z,,m,2020-01-01,2020-12-31 21:00:00,,,0 days 03:00:00
4904,/glade/campaign/collections/rda/data/d616000/k...,ilev,reference,ilev,vertical stagger levels,Dimensionless,2020-01-01,2020-12-31 21:00:00,,,0 days 03:00:00


## Select data and plot

#### What if you don't know the variable names ?
- Use pandas logic to print out the short_name and long_name

In [10]:
col.df[['variable','long_name']]

Unnamed: 0,variable,long_name
0,ACDEWC,"Accumulated canopy dew rate, accumulated over ..."
1,ACDRIPR,"Accumulated canopy precipitation drip rate, ac..."
2,ACDRIPS,"Accumulated canopy snow drip rate, accumulated..."
3,ACECAN,Accumulated net evaporation of canopy water (e...
4,ACEDIR,Accumulated net soil evaporation or snowpack s...
...,...,...
4901,V,
4902,W,
4903,Z,
4904,ilev,vertical stagger levels


- We notice that long_name is not available for some variables like 'V'
- In such cases, please look at the dataset documentation for additional information: https://rda.ucar.edu/datasets/d616000/documentation/#

### Temperature
- Plot temperature for a random date

In [11]:
cat_temp = col.search(variable='T2')
cat_temp.df.head()

Unnamed: 0,path,variable,format,short_name,long_name,units,start_time,end_time,level,level_units,frequency
0,/glade/campaign/collections/rda/data/d616000/k...,T2,reference,T2,,K,1999-12-31,1999-12-31 23:00:00,,,0 days 01:00:00
1,/glade/campaign/collections/rda/data/d616000/k...,T2,reference,T2,,K,2000-01-01,2000-12-31 23:00:00,,,0 days 01:00:00
2,/glade/campaign/collections/rda/data/d616000/k...,T2,reference,T2,,K,2001-01-01,2001-12-31 23:00:00,,,0 days 01:00:00
3,/glade/campaign/collections/rda/data/d616000/k...,T2,reference,T2,,K,2002-01-01,2002-12-31 23:00:00,,,0 days 01:00:00
4,/glade/campaign/collections/rda/data/d616000/k...,T2,reference,T2,,K,2003-01-01,2003-12-31 23:00:00,,,0 days 01:00:00


- The data is organized in (virtual) zarr stores with one year's worth of data in one file
- Select a year. This is done by selcting the start time to be Jan 1st of that year or the end time to be Dec 31st of the same year
- This also means that if you want to request data for other days, say Oct 1 for the year YYYY, you first have to load the data for one year YYYY and then select the data for that particular day. This example is discussed below.

In [12]:
date = "2020-01-01"
# year = "2021"
cat_temp_subset = cat_temp.search(start_time = date)
cat_temp_subset

Unnamed: 0,unique
path,1
variable,1
format,1
short_name,1
long_name,0
units,1
start_time,1
end_time,1
level,0
level_units,0


### Load data into xarray

In [13]:
# Load catalog entries for subset into a dictionary of xarray datasets, and open the first one.
dsets = cat_temp_subset.to_dataset_dict(zarr_kwargs={"consolidated": True})
print(f"\nDataset dictionary keys:\n {dsets.keys()}")


--> The keys in the returned dictionary of datasets are constructed as follows:
	'variable.short_name'



Dataset dictionary keys:
 dict_keys(['T2.T2'])


In [14]:
# Load the first dataset and display a summary.
dataset_key = list(dsets.keys())[0]
# store_name = dataset_key + ".zarr"
print(dsets.keys())
ds = dsets[dataset_key]
ds = ds.T2
ds

dict_keys(['T2.T2'])


Unnamed: 0,Array,Chunk
Bytes,97.57 GiB,2.85 MiB
Shape,"(8784, 2027, 1471)","(1, 1014, 736)"
Dask graph,35136 chunks in 2 graph layers,35136 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 97.57 GiB 2.85 MiB Shape (8784, 2027, 1471) (1, 1014, 736) Dask graph 35136 chunks in 2 graph layers Data type float32 numpy.ndarray",1471  2027  8784,

Unnamed: 0,Array,Chunk
Bytes,97.57 GiB,2.85 MiB
Shape,"(8784, 2027, 1471)","(1, 1014, 736)"
Dask graph,35136 chunks in 2 graph layers,35136 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,97.57 GiB,2.85 MiB
Shape,"(8784, 2027, 1471)","(1, 1014, 736)"
Dask graph,35136 chunks in 2 graph layers,35136 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 97.57 GiB 2.85 MiB Shape (8784, 2027, 1471) (1, 1014, 736) Dask graph 35136 chunks in 2 graph layers Data type float32 numpy.ndarray",1471  2027  8784,

Unnamed: 0,Array,Chunk
Bytes,97.57 GiB,2.85 MiB
Shape,"(8784, 2027, 1471)","(1, 1014, 736)"
Dask graph,35136 chunks in 2 graph layers,35136 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,97.57 GiB,2.85 MiB
Shape,"(8784, 2027, 1471)","(1, 1014, 736)"
Dask graph,35136 chunks in 2 graph layers,35136 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 97.57 GiB 2.85 MiB Shape (8784, 2027, 1471) (1, 1014, 736) Dask graph 35136 chunks in 2 graph layers Data type float32 numpy.ndarray",1471  2027  8784,

Unnamed: 0,Array,Chunk
Bytes,97.57 GiB,2.85 MiB
Shape,"(8784, 2027, 1471)","(1, 1014, 736)"
Dask graph,35136 chunks in 2 graph layers,35136 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,68.62 kiB,8 B
Shape,"(8784,)","(1,)"
Dask graph,8784 chunks in 2 graph layers,8784 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 68.62 kiB 8 B Shape (8784,) (1,) Dask graph 8784 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",8784  1,

Unnamed: 0,Array,Chunk
Bytes,68.62 kiB,8 B
Shape,"(8784,)","(1,)"
Dask graph,8784 chunks in 2 graph layers,8784 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray


In [15]:
%%time
desired_date = "2020-10-01"
ds_subset = ds.sel(Time=desired_date,method='nearest')
ds_subset

CPU times: user 2.07 ms, sys: 7.23 ms, total: 9.3 ms
Wall time: 19 ms


Unnamed: 0,Array,Chunk
Bytes,11.37 MiB,2.85 MiB
Shape,"(2027, 1471)","(1014, 736)"
Dask graph,4 chunks in 3 graph layers,4 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.37 MiB 2.85 MiB Shape (2027, 1471) (1014, 736) Dask graph 4 chunks in 3 graph layers Data type float32 numpy.ndarray",1471  2027,

Unnamed: 0,Array,Chunk
Bytes,11.37 MiB,2.85 MiB
Shape,"(2027, 1471)","(1014, 736)"
Dask graph,4 chunks in 3 graph layers,4 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.37 MiB,2.85 MiB
Shape,"(2027, 1471)","(1014, 736)"
Dask graph,4 chunks in 3 graph layers,4 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.37 MiB 2.85 MiB Shape (2027, 1471) (1014, 736) Dask graph 4 chunks in 3 graph layers Data type float32 numpy.ndarray",1471  2027,

Unnamed: 0,Array,Chunk
Bytes,11.37 MiB,2.85 MiB
Shape,"(2027, 1471)","(1014, 736)"
Dask graph,4 chunks in 3 graph layers,4 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,11.37 MiB,2.85 MiB
Shape,"(2027, 1471)","(1014, 736)"
Dask graph,4 chunks in 3 graph layers,4 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 11.37 MiB 2.85 MiB Shape (2027, 1471) (1014, 736) Dask graph 4 chunks in 3 graph layers Data type float32 numpy.ndarray",1471  2027,

Unnamed: 0,Array,Chunk
Bytes,11.37 MiB,2.85 MiB
Shape,"(2027, 1471)","(1014, 736)"
Dask graph,4 chunks in 3 graph layers,4 chunks in 3 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,(),()
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
Array Chunk Bytes 8 B 8 B Shape () () Dask graph 1 chunks in 3 graph layers Data type datetime64[ns] numpy.ndarray,,

Unnamed: 0,Array,Chunk
Bytes,8 B,8 B
Shape,(),()
Dask graph,1 chunks in 3 graph layers,1 chunks in 3 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray


In [None]:
%%time
ds_subset.plot(cmap='inferno')

In [None]:
cluster.close()