# ESGF holdings summaries

The goal here is to summarize the holdings on ESGF nodes (LLNL only for now) to see exactly what model-scenario-variable combinations have data and which don't. We will assume we can treat variants interchangeably. Using the holdings table(s) generate with `esgf_holding.py`, any of the `grid_type`, `version`, `n_files`, or `filenmaes` can be used to determine if files were found for a given combination of attributes (NaN indicates no files found).

We will focus on daily data first, because this is the most useful for our purposes. It is often the case that there is monthly data if daily data exists for a certain combination, and monthly dat acan be generated from it if not.

In [1]:
import pandas as pd
import numpy as np

pd.set_option("display.max_rows", 500)
pd.set_option("display.max_columns", 500)

## LLNL Node

We only have the LLNL node summarized so far:

In [2]:
holdings = pd.read_csv("llnl_esgf_holdings.csv")

## Daily data

These are the daily frequencies we are interested in so far:

In [3]:
day_freqs = ["day", "Eday", "Oday"]
holdings = holdings.query("frequency in @day_freqs")

Use the `n_files` column to determine if a given combination was found:

In [4]:
holdings["valid"] = ~holdings.n_files.isnull()

Groupy by the combinations and summarize:

In [5]:
tmp = (
    holdings.groupby(["model", "scenario", "variable"])
    .valid.sum()
    .reset_index(name="valid")
)
tmp["valid"] = ["\u2713" if x > 0 else "" for x in tmp["valid"]]
tmp

Unnamed: 0,model,scenario,variable,valid
0,ACCESS-CM2,historical,clt,✓
1,ACCESS-CM2,historical,evspsbl,
2,ACCESS-CM2,historical,hfls,✓
3,ACCESS-CM2,historical,hfss,✓
4,ACCESS-CM2,historical,hus,✓
...,...,...,...,...
1975,TaiESM1,ssp585,ts,
1976,TaiESM1,ssp585,ua,✓
1977,TaiESM1,ssp585,uas,
1978,TaiESM1,ssp585,va,✓


Unexpected, but length of df does not match length of all possible combinations. This is probably because if no data was found for any variable for a given scenario, it was omitted. 

In [6]:
print(
    "combos:",
    len(tmp.variable.unique()) * len(tmp.model.unique()) * len(tmp.scenario.unique()),
)
print("Number of rows:", len(tmp))

combos: 2100
Number of rows: 1980


In [7]:
tmp.pivot_table(
    values="valid",
    columns="variable",
    index=["model", "scenario"],
    aggfunc=lambda x: "".join(x),
)

Unnamed: 0_level_0,variable,clt,evspsbl,hfls,hfss,hus,huss,mrro,mrsol,mrsos,pr,prsn,psl,rlds,rls,rsds,rss,sfcWind,sfcWindmax,snd,snw,ta,tas,tasmax,tasmin,tos,ts,ua,uas,va,vas
model,scenario,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1
ACCESS-CM2,historical,✓,,✓,✓,✓,✓,✓,,✓,✓,✓,✓,✓,,✓,,✓,✓,,✓,✓,✓,✓,✓,✓,,✓,✓,✓,✓
ACCESS-CM2,ssp126,✓,,✓,✓,✓,✓,✓,,✓,✓,✓,✓,✓,,✓,,✓,✓,,✓,✓,✓,✓,✓,✓,,✓,✓,✓,✓
ACCESS-CM2,ssp245,✓,,✓,✓,✓,✓,✓,,✓,✓,✓,✓,✓,,✓,,✓,✓,,✓,✓,✓,✓,✓,✓,,✓,✓,✓,✓
ACCESS-CM2,ssp370,✓,,✓,✓,✓,✓,✓,,✓,✓,✓,✓,✓,,✓,,✓,✓,,✓,✓,✓,✓,✓,✓,,✓,✓,✓,✓
ACCESS-CM2,ssp585,✓,,✓,✓,✓,✓,✓,,✓,✓,✓,✓,✓,,✓,,✓,✓,,✓,✓,✓,✓,✓,✓,,✓,✓,✓,✓
CESM2,historical,✓,,✓,✓,✓,✓,✓,✓,✓,✓,✓,✓,✓,,✓,,✓,,✓,✓,✓,✓,,,✓,,✓,,✓,
CESM2,ssp126,✓,,✓,✓,✓,✓,✓,✓,✓,✓,✓,✓,✓,,✓,,✓,,,✓,✓,✓,✓,✓,✓,,✓,,✓,
CESM2,ssp245,✓,,✓,✓,✓,✓,✓,✓,✓,✓,✓,✓,✓,,✓,,✓,,,✓,✓,✓,✓,✓,✓,,✓,,✓,
CESM2,ssp370,✓,,✓,✓,✓,✓,✓,✓,✓,✓,✓,✓,✓,,✓,,✓,,,✓,✓,✓,✓,✓,✓,,✓,,✓,
CESM2,ssp585,✓,,✓,✓,✓,✓,✓,✓,✓,✓,✓,✓,✓,,✓,,✓,,,✓,✓,✓,✓,✓,✓,,✓,,✓,
