# Organizing, Subsetting and Processing Data

Now that you've successfully downloaded the GRACE and GLDAS data, you will need to read in and process the data to get groundwater anamoly estimates. Both datasets have raw data contained in .nc or .nc4 files. These are files for storing multi-dimensional data. We can use the `xarray` package to read them in. 

### Reading in GRACE Data

The GRACE MASCON data is contained in 1 .nc file and contains global land MASCON GRACE data. We start by using `xarray` to read in the data. 

In [2]:
import xarray as xr
import os

os.environ['HDF5_USE_FILE_LOCKING']='FALSE'

In [5]:
# Need to change this to relative path later
grace_df = xr.open_dataset("/home/kmk58/remoteData/GRACE_MASCON/data/TELLUS_GRAC-GRFO_MASCON_CRI_GRID_RL06.1_V3/GRCTellus.JPL.200204_202304.GLO.RL06.1M.MSCNv03CRI.nc") 

Let's begin by inspecting and understanding the data. After running the command below, you can interact with the output to inspect and understand the data. 

In [7]:
grace_df

The first thing you will note is that the dataset has several dimensions: `lon`, `lat`, `time`, and `bounds`. This is because the GRACE data come at the pixel-level for each year. In our sitution, a pixel is the smallest geographic unit of analysis. Because collecting and processing GRACE satellite data is technical and compuationally expensive, GRACE measurements are given as .5-degree by .5-degree squares. The pixels cover the entire Earth's surface and each have a GRACE measurement monthly from 2003-present. A visual of this is shown below, where each square in the GRID correponds to a pixel (Sharma, Patnaik, Biswal, Reager, 2020). Note that the yellow dots are gauging stations for comparison. 

![GRACE Data Grids and Gauging Stations][def]




[def]: /DSSG2023-Groundwater/notebooks-and-markdowns/GRACE_grids.png

### GLDAS Data 

Next, we will read in the GLDAS data which provides us with information on snow pack and soil moisture. 

## Citations

(Sharma D, Patnaik S, Biswal B, Reager JT. Characterization of Basin-Scale Dynamic Storage–Discharge Relationship Using Daily GRACE Based Storage Anomaly Data. Geosciences. 2020; 10(10):404. https://doi.org/10.3390/geosciences10100404). 