The goal of this notebook is to reproduce graphs from relevant papers to make sure that our process matches or closely matches other output. 

## Castle, et. al (2014)

In this paper, the authors focus on groundwater depletion in the Colorado River Basin. They have several charts relevant for comparisons, including one that has terrestrial water storage anomalies using GRACE data. To verify we are processing our data correctly, I will recreate the TWSA chart they have for the whole basin, pictured below: 

<p align="center">
    <img src="../notebooks-and-markdowns/whole_basin_castlepaper.png" width="800" />
</p>

A few technical things to note for their methods: 

+ They use GRACE RL05 from CSR
+ Their deviations are calculated as deviations from the mean during the study period of Jan 2003-Nov 2013
+ Since they use spherical harmonics data, their pixel size is 1 degree by 1 degree

Our process: 

+ load in GRACE
+ combine into a dataframe 
+ calculate deviation from the mean for Jan 2003-Nov 2013
+ average over the basin 
+ graph 

It's hard to find the right files in the archives to download, let's start with the version of GRACE spherical harmonics we have downloaded. 

In [7]:
# Defining path to GRACE data
p = "/home/kmk58/remoteData/GRACE/data/TELLUS_GRAC_L3_CSR_RL06_LND_v04/"

In [8]:
import geopandas as gpd 
import matplotlib.pyplot as plt

shpfl = gpd.read_file("/home/kmk58/remoteData/shapefiles/Colorado_River_Basin_Hydrological_Boundaries_with_Areas_served_by_Colorado_River.shp")
# Code to filter to shapefile

# extract a dataframe of the coordinates from the shapefile
coords = shpfl.get_coordinates()
# find the maximum and minimum lat/longs, corresponding to the red points on the figure above
lon_min = min(coords['x'])
lon_max = max(coords['x'])
lat_min = min(coords['y'])
lat_max = max(coords['y'])

In [30]:
import xarray as xr
import os
import pandas as pd
import numpy as np

os.environ['HDF5_USE_FILE_LOCKING']='FALSE'

crb_df = pd.DataFrame()

#Iterating through files in path
for filename in os.listdir(p):
    if filename.endswith(".nc"):
    #Reading in data as xarray then converting to DataFrame
        xd = xr.open_dataset(p+str(filename))
        xd_df = xd.to_dataframe()
        xd_df.reset_index(inplace=True)
        xd_df["lon"] = xd_df["lon"] - 180
    
        #Extracting only needed columns 
        int_df = xd_df[['lon','lat','time','lwe_thickness','uncertainty']]    
    
        #Index where CRB Basin is 
        #df_slice = int_df[(int_df.lon.isin(list(np.arange(-90.5,-100,-0.5)))) & (int_df.lat.isin(list(np.arange(30.5,40,0.5))))]
        df_slice = int_df[(int_df.lon> lon_min) & (int_df.lon < lon_max)]
        df_slice = df_slice[(df_slice.lat> lat_min) & (df_slice.lat < lat_max)]
        crb_df = pd.concat([crb_df, df_slice], axis=0)
    


In [32]:
crb_df = crb_df.drop_duplicates()

In [33]:
crb_df

Unnamed: 0,lon,lat,time,lwe_thickness,uncertainty
22202,-118.5,31.5,2017-06-10 12:00:00,-0.034839,0.025026
22204,-118.5,32.5,2017-06-10 12:00:00,-0.042626,0.024973
22206,-118.5,33.5,2017-06-10 12:00:00,-0.051571,0.024918
22208,-118.5,34.5,2017-06-10 12:00:00,-0.051942,0.024864
22210,-118.5,35.5,2017-06-10 12:00:00,-0.038551,0.024813
...,...,...,...,...,...
27976,-102.5,38.5,2002-04-18 00:00:00,-0.019572,0.022868
27978,-102.5,39.5,2002-04-18 00:00:00,0.003483,0.022870
27980,-102.5,40.5,2002-04-18 00:00:00,0.026293,0.022892
27982,-102.5,41.5,2002-04-18 00:00:00,0.042169,0.022930


In [None]:
# There is no scale factor file I can find, let's do without first. 

