# ESGF Holdings summary - WRF subdaily variables

The goal here is to summarize the holdings on ESGF nodes (LLNL only for now) to see exactly what model-scenario-variable combinations have data and which don't, specifically for the WRF variables at subdaily frequencies. We will assume we can treat variants interchangeably. Using the holdings table(s) generated with `esgf_holding.py`, any of the `grid_type`, `version`, `n_files`, or `filenmaes` can be used to determine if files were found for a given combination of attributes (NaN indicates no files found).

We will focus on daily data first, because this is the most useful for our purposes. It is often the case that there is monthly data if daily data exists for a certain combination, and monthly dat acan be generated from it if not.

In [1]:
import pandas as pd
import numpy as np

pd.set_option("display.max_rows", 500)
pd.set_option("display.max_columns", 500)

In [2]:
holdings = pd.read_csv("llnl_esgf_holdings_wrf.csv")

In [3]:
holdings["valid"] = ~holdings.n_files.isnull()

Because there are multiple "Table IDs" used for each frequency (6hr and 3hr), we should just rename all of them to be consistent so that we can have a complete view of what is available at each frequency. 

In [4]:
holdings.frequency.unique()

array(['6hrLev', '6hrPlev', '6hrPlevPt', '3hr', 'E3hr', 'E3hrPt', 'CF3hr'],
      dtype=object)

In [5]:
replace_dict = {k: "6hr" for k in ["6hrLev", "6hrPlev", "6hrPlevPt"]}
replace_dict.update({k: "3hr" for k in ["3hr", "E3hr", "E3hrPt", "CF3hr"]})
holdings = holdings.replace(replace_dict)

Check out only 6hr first:

In [6]:
tmp = (
    holdings.query("frequency == '6hr'")
    .groupby(["model", "scenario", "variable"])
    .valid.sum()
    .reset_index(name="valid")
)
tmp["valid"] = ["\u2713" if x > 0 else "" for x in tmp["valid"]]

tmp.pivot_table(
    values="valid",
    columns="variable",
    index=["model", "scenario"],
    aggfunc=lambda x: "".join(x),
)

Unnamed: 0_level_0,variable,hus,huss,mrsol,ps,psl,ta,tas,ts,tsl,ua,uas,va,vas,zg
model,scenario,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ACCESS-CM2,historical,,,,,✓,✓,,,,,,✓,,
ACCESS-CM2,ssp126,,,,,✓,✓,,,,✓,,✓,,
ACCESS-CM2,ssp245,,,,,,✓,,,,,,✓,,
ACCESS-CM2,ssp370,,,,,✓,✓,,,,✓,,✓,,
ACCESS-CM2,ssp585,,,,,✓,✓,,,,✓,,✓,,
CESM2,historical,✓,,,,,,,,,✓,,,,
CESM2,ssp126,,,,,✓,,,,,,,,,
CESM2,ssp245,,,,,✓,,,,,,,,,
CESM2,ssp370,,,,,✓,,,,,,,,,
CESM2,ssp585,,,,,✓,,,,,,,,,


Now 3hr:

In [15]:
tmp = (
    holdings.query("frequency == '3hr'")
    .groupby(["model", "scenario", "variable"])
    .valid.sum()
    .reset_index(name="valid")
)
tmp["valid"] = ["\u2713" if x > 0 else "" for x in tmp["valid"]]

tmp.pivot_table(
    values="valid",
    columns="variable",
    index=["model", "scenario"],
    aggfunc=lambda x: "".join(x),
)

Unnamed: 0_level_0,variable,hur,hurs,hus,huss,mrsos,ps,psl,ta,tas,ts,tsl,ua,uas,va,vas,zg
model,scenario,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
ACCESS-CM2,historical,,,,✓,,,,,✓,,,,✓,,✓,
ACCESS-CM2,ssp126,,,,✓,,,,,✓,,,,✓,,✓,
ACCESS-CM2,ssp245,,,,✓,,,,,✓,,,,✓,,✓,
ACCESS-CM2,ssp370,,,,✓,,,,,✓,,,,✓,,✓,
ACCESS-CM2,ssp585,,,,✓,,,,,✓,,,,✓,,✓,
CESM2,historical,,,,,,,,,,,,,,,,
CESM2,ssp126,,,,,,,,,,,,,,,,
CESM2,ssp245,,,,,,,,,,,,,,,,
CESM2,ssp370,,,,✓,,,,,✓,,,,,,,
CESM2,ssp585,,,,,,,,,,,,,,,,


Now either 3 or 6 hr (i.e. any in this table):

In [8]:
tmp = (
    holdings.groupby(["model", "scenario", "variable"])
    .valid.sum()
    .reset_index(name="valid")
)
tmp["valid"] = ["\u2713" if x > 0 else "" for x in tmp["valid"]]

tmp.pivot_table(
    values="valid",
    columns="variable",
    index=["model", "scenario"],
    aggfunc=lambda x: "".join(x),
)

Unnamed: 0_level_0,variable,hus,huss,mrsol,ps,psl,ta,tas,ts,tsl,ua,uas,va,vas,zg
model,scenario,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ACCESS-CM2,historical,,✓,,,✓,✓,✓,,,,✓,✓,✓,
ACCESS-CM2,ssp126,,✓,,,✓,✓,✓,,,✓,✓,✓,✓,
ACCESS-CM2,ssp245,,✓,,,,✓,✓,,,,✓,✓,✓,
ACCESS-CM2,ssp370,,✓,,,✓,✓,✓,,,✓,✓,✓,✓,
ACCESS-CM2,ssp585,,✓,,,✓,✓,✓,,,✓,✓,✓,✓,
CESM2,historical,✓,,,,,,,,,✓,,,,
CESM2,ssp126,,,,,✓,,,,,,,,,
CESM2,ssp245,,,,,✓,,,,,,,,,
CESM2,ssp370,,✓,,,✓,,✓,,,,,,,
CESM2,ssp585,,,,,✓,,,,,,,,,
