# Explore ESGF holdings

Use this notebook to explore the ESGF holdings. This will determine what specific simulation variants we will want to mirror on the ACDN.

In [159]:
import pandas as pd

Read a CSV containing info on holdings for an ESGF node (LLNL only for now):

In [160]:
df = pd.read_csv("llnl_esgf_holdings.csv")

Do any variants have more data than the others for a given model and scenario?

Determine the data availability for each model, scenario, and variant, in terms of number of variables x temporal frequencies. We will combine those two fields to get an idea of representation across both daily and monthly frequencies.

In [187]:
# create a column that is just a concatenation of temnporal frequency and vairbale name to simplify
df["freq_var"] = df["frequency"] + "_" + df["variable"]

Next, group by mode, scenario, and variant and tally the number of unique variable-frequency combinations:

In [242]:
rep_df = pd.DataFrame(
    df[df["grid_type"].notna()].groupby(["model", "scenario", "variant"])["freq_var"].nunique()
)

Then, for each model, see if there are any variants that have the max representation for all desired scenarios:

In [246]:
models = df.model.unique()

# unique sorted list of scenarios represented for each variant should be this if all desired scenarios are present
target_scenarios = ["historical", "ssp126", "ssp245", "ssp370", "ssp585"]

for model in models:
    model_df = rep_df.loc[model]
    max_rep = model_df.max()

    # I guess first check if there is the ideal situation, which is max representation for all 5 scenarios
    # iterate over variants to achieve this
    # max rep variants DataFrame
    mrv_df = model_df[model_df >= max_rep].dropna().reset_index()
    
    best_variants = mrv_df.groupby("variant")["scenario"].unique().apply(sorted).isin([target_scenarios]).index.values
    print(model, best_variants, "\n")

ACCESS-CM2 ['r1i1p1f1' 'r4i1p1f1' 'r5i1p1f1'] 

CESM2 ['r11i1p1f1' 'r2i1p1f1'] 

CNRM-CM6-1-HR ['r1i1p1f2'] 

EC-Earth3-Veg ['r10i1p1f1' 'r12i1p1f1' 'r14i1p1f1' 'r1i1p1f1' 'r2i1p1f1' 'r3i1p1f1'
 'r4i1p1f1' 'r5i1p1f1' 'r6i1p1f1'] 

GFDL-ESM4 ['r1i1p1f1'] 

HadGEM3-GC31-LL ['r1i1p1f3' 'r2i1p1f3' 'r3i1p1f3' 'r4i1p1f3' 'r5i1p1f3'] 

HadGEM3-GC31-MM ['r1i1p1f3' 'r4i1p1f3'] 

KACE-1-0-G ['r1i1p1f1'] 

MIROC6 ['r10i1p1f1' 'r1i1p1f1' 'r2i1p1f1' 'r3i1p1f1' 'r4i1p1f1' 'r5i1p1f1'
 'r6i1p1f1' 'r7i1p1f1' 'r8i1p1f1' 'r9i1p1f1'] 

MPI-ESM1-2-LR ['r10i1p1f1' 'r11i1p1f1' 'r12i1p1f1' 'r13i1p1f1' 'r14i1p1f1' 'r15i1p1f1'
 'r16i1p1f1' 'r17i1p1f1' 'r18i1p1f1' 'r19i1p1f1' 'r1i1p1f1' 'r1i2000p1f1'
 'r20i1p1f1' 'r21i1p1f1' 'r22i1p1f1' 'r23i1p1f1' 'r24i1p1f1' 'r25i1p1f1'
 'r26i1p1f1' 'r27i1p1f1' 'r28i1p1f1' 'r29i1p1f1' 'r2i1p1f1' 'r30i1p1f1'
 'r3i1p1f1' 'r4i1p1f1' 'r5i1p1f1' 'r6i1p1f1' 'r7i1p1f1' 'r8i1p1f1'
 'r9i1p1f1'] 

NorESM2-MM ['r1i1p1f1' 'r2i1p1f1' 'r3i1p1f1'] 



Wow, it looks like all of the models have at least one scenario that has the most variable-frequency combinations within a modfel available for all scenarios. This is **nice**.