This script is used to generate the frequency, total days, and intensity of the HWs using **CESM-LE**

The data sets are from:   
```bash
/glade/scratch/zhonghua/CESM-LE-members-gridcell-temp-extracted-csv/
```

The results are saved at:
```
/glade/scratch/zhonghua/uhws/HWs_CESM/
```

Note:     
**2006**: Using 2006 itself to calculate the percentile, frequency, total days, and intensity  
**2061**: Using the percentile of **2006** to calculate frequency, total days, and intensity of 2061  

In [1]:
import xarray as xr
import datetime
import pandas as pd
import numpy as np
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import time
import gc
import util
# from s3fs.core import S3FileSystem
# s3 = S3FileSystem()

save_dir = "/glade/scratch/zhonghua/uhws/HWs_CESM/"

## Step 1: Load 2006 and 2061 data

In [2]:
cesm_2006 = util.load_df("/glade/scratch/zhonghua/CESM-LE-members-gridcell-temp-extracted-csv"\
                         "/TREFHTMX_heat_LE_2006.csv")
cesm_2061 = util.load_df("/glade/scratch/zhonghua/CESM-LE-members-gridcell-temp-extracted-csv"\
                         "/TREFHTMX_heat_LE_2061.csv")

Start to load csv /glade/scratch/zhonghua/CESM-LE-members-gridcell-temp-extracted-csv/TREFHTMX_heat_LE_2006.csv
It takes 59.93881106376648 to load csv
Start to load csv /glade/scratch/zhonghua/CESM-LE-members-gridcell-temp-extracted-csv/TREFHTMX_heat_LE_2061.csv
It takes 59.19037079811096 to load csv


## Step 2: Start the pipeline to use 98% percentile (2006) to get frequency (events/year), total days (days/year), and intensity (K) of 2006 and 2061

In [3]:
# create member_ls
member_ls = []
for i in range(2,34):
    member_idx = (str(i).zfill(3))
    member_ls.append(member_idx+"_max")

In [4]:
frequency_2006_ls=[]
duration_2006_ls=[]
intensity_2006_ls=[]
quantile_avail_2006_ls=[]

frequency_2061_ls=[]
duration_2061_ls=[]
intensity_2061_ls=[]
quantile_avail_2061_ls=[]
for member in member_ls:
    print("start member:",member)
    
    # start 2006
    start_time_2006=time.time()
    cesm_2006_hw, quantile_avail_2006=util.get_heat_waves_df(cesm_2006[["lat","lon","time",member]], 0.98, 2, "cesm", None)
    
    frequency_2006_ls.append(util.get_frequency(cesm_2006_hw,member))
    duration_2006_ls.append(util.get_duration(cesm_2006_hw,member))
    intensity_2006_ls.append(util.get_intensity(cesm_2006_hw,member))
    quantile_avail_2006_ls.append(quantile_avail_2006.copy().rename(columns={"quant": member}).set_index(["lat","lon"]))
    print("It took",time.time()-start_time_2006,"to deal with",member,"for year 2006")
    
    # start 2061
    start_time_2061=time.time()
    cesm_2061_hw, quantile_avail_2061=util.get_heat_waves_df(cesm_2061[["lat","lon","time",member]], None, 2, "cesm", quantile_avail_2006)
    
    frequency_2061_ls.append(util.get_frequency(cesm_2061_hw,member))
    duration_2061_ls.append(util.get_duration(cesm_2061_hw,member))
    intensity_2061_ls.append(util.get_intensity(cesm_2061_hw,member))
    quantile_avail_2061_ls.append(quantile_avail_2061.copy().rename(columns={"quant": member}).set_index(["lat","lon"]))
    print("It took",time.time()-start_time_2061,"to deal with",member,"for year 2061")
    print("\n")
    
    del cesm_2006_hw, cesm_2061_hw, quantile_avail_2006, quantile_avail_2061
    gc.collect()

start member: 002_max
The quantile is: 0.98
The duration threshold is: 2


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_with_quantile["HW"][df_with_quantile["mean"]> df_with_quantile["quant"]] = 0


It took 28.524385452270508 to deal with 002_max for year 2006
The quantile is: None
The duration threshold is: 2
It took 24.11390709877014 to deal with 002_max for year 2061


start member: 003_max
The quantile is: 0.98
The duration threshold is: 2
It took 25.669853687286377 to deal with 003_max for year 2006
The quantile is: None
The duration threshold is: 2
It took 20.799935340881348 to deal with 003_max for year 2061


start member: 004_max
The quantile is: 0.98
The duration threshold is: 2
It took 25.819986581802368 to deal with 004_max for year 2006
The quantile is: None
The duration threshold is: 2
It took 20.786059856414795 to deal with 004_max for year 2061


start member: 005_max
The quantile is: 0.98
The duration threshold is: 2
It took 26.138872861862183 to deal with 005_max for year 2006
The quantile is: None
The duration threshold is: 2
It took 20.775070428848267 to deal with 005_max for year 2061


start member: 006_max
The quantile is: 0.98
The duration threshold is: 2
I

## Merge to dataframe and save as csv

In [5]:
frequency_2006 = pd.concat(frequency_2006_ls, axis=1)
duration_2006 = pd.concat(duration_2006_ls, axis=1)
intensity_2006 = pd.concat(intensity_2006_ls, axis=1)
quantile_avail_2006 = pd.concat(quantile_avail_2006_ls, axis=1)

frequency_2061 = pd.concat(frequency_2061_ls, axis=1)
duration_2061 = pd.concat(duration_2061_ls, axis=1)
intensity_2061 = pd.concat(intensity_2061_ls, axis=1)
quantile_avail_2061 = pd.concat(quantile_avail_2061_ls, axis=1)

In [6]:
# here the quantile 2006 and quantile 2061 should be same
frequency_2006.to_csv(save_dir+"2006_frequency.csv")
duration_2006.to_csv(save_dir+"2006_totaldays.csv")
intensity_2006.to_csv(save_dir+"2006_intensity.csv")
quantile_avail_2006.to_csv(save_dir+"2006_percentile.csv")

frequency_2061.to_csv(save_dir+"2061_frequency.csv")
duration_2061.to_csv(save_dir+"2061_totaldays.csv")
intensity_2061.to_csv(save_dir+"2061_intensity.csv")
quantile_avail_2061.to_csv(save_dir+"2061_percentile.csv")