# Download AORC Version 1.1 Dataset from AWS S3

This notebook can be used to download AORC Version 1.1 data from the AWS S3 repository (https://registry.opendata.aws/noaa-nws-aorc/) to netCDF files. Additional details on the AORC datset can be found in Fall et al. 2023 ( https://doi.org/10.1111/1752-1688.13143). 

The code was modified from Michael AuCoin's Tutorial (https://nbviewer.org/github/NOAA-OWP/AORC-jupyter-notebooks/blob/master/jupyter_notebooks/AORC_Zarr_notebook.ipynb) to include downloading the .zarr files as a netCDF for a specific area of interest. 

Additionally, wind_speed is calculated and exported using the windspeed vector components provided within the .zarr files.

Created: 7/11/2024

In [None]:
import xarray as xr
import fsspec
import numpy as np
import s3fs
import zarr
import dask

# if running locally, you will need this line
dask.config.set(temporary_directory='/dask-worker-space')

# URL to the S3 Bucket
base_url = f's3://noaa-nws-aorc-v1-1-1km'

# Directory for netCDF to download
download_dir = 'path/to/dir/'

## Download a Single Year of AORC Data

In the following code cells we will download data for a specified area for a single year. In this example, we will download multiple variables at once. A complete list of variable names and descriotions is avalible on the AWS data repository page linked above. 

In [None]:
# Set the year of interest
year = '1996'

# Identify the variables of interest
var=['APCP_surface', 'TMP_2maboveground', 'DLWRF_surface', 'DSWRF_surface', 'UGRD_10maboveground', 'VGRD_10maboveground']

In [None]:
#%%time
# Extract the variables to a dataset for the specified year.  
single_year_url = f'{base_url}/{year}.zarr/'
ds_single = xr.open_zarr(fsspec.get_mapper(single_year_url, anon=True), consolidated=True)
ds = ds_single[var]

#### Calculate Wind Speed from AORC Vector Components

In [None]:
# Identify the indices of the windspeed vector components in the var list
u = ds_single[var[4]] # update index of U Wind Speed Component
v = ds_single[var[5]] # update index of V Wind Speed Component

wind_speed = np.sqrt(u**2 + v**2)
ds['wind_speed'] = wind_speed

#### View Dataset and Size

In [None]:
print(ds)
print(f'Dataset size: {ds.nbytes/1e12:.1f} TB')

#### Filter Dataset to an Area of Interest

In [None]:
# Set max/min lat and lon
minlon = -120.5
maxlon = -119.1
minlat = 38.5
maxlat = 40

ds_bbox = ds.sel(latitude=slice(minlat, maxlat), longitude=slice(minlon, maxlon))

# Print the final file size to be downloaded
print(f'Variable size: {ds_bbox.nbytes/1e9:.1f} GB')

#### Download as a netCDF

In [None]:
# Identify a file name for the netCDF file
file_name = 'file_name'

ds_bbox.to_netcdf(f'{download_dir}/{file_name}.nc')

## Download Multiple Years of AORC Data

In the following code cells we will download data for a specified area for a range of years. In this example, we will download multiple variables at once. A complete list of variable names and descriotions is avalible on the AWS data repository page linked above. 

In [None]:
# Set the range of years to download
dataset_years = list(range(1996,1998)) 

# Identify the variables of interest
var=['APCP_surface', 'TMP_2maboveground', 'DLWRF_surface', 'DSWRF_surface', 'UGRD_10maboveground', 'VGRD_10maboveground']

In [None]:

s3_out = s3fs.S3FileSystem(anon=True)
fileset = [s3fs.S3Map(
            root=f"s3://{base_url}/{dataset_year}.zarr", s3=s3_out, check=False
        ) for dataset_year in dataset_years]

ds_multi_year = xr.open_mfdataset(fileset, engine='zarr')
ds_multi = ds_multi_year[var]

#### Calculate Wind Speed from AORC Vector Components

In [None]:
# Identify the indices of the windspeed vector components in the var list
u = ds_multi_year[var[4]] # update index of U Wind Speed Component
v = ds_multi_year[var[5]] # update index of V Wind Speed Component

wind_speed = np.sqrt(u**2 + v**2)
ds_multi['wind_speed'] = wind_speed

#### View Dataset and Size

In [None]:
print(ds_multi)
print(f'Variable size: {ds_multi.nbytes/1e12:.1f} TB')

#### Filter Dataset to an Area of Interest

In [None]:
# Set max/min lat and lon
minlon = -120.5
maxlon = -119.1
minlat = 38.5
maxlat = 40

ds_multi_bbox = ds_multi.sel(latitude=slice(minlat, maxlat), longitude=slice(minlon, maxlon))

# Print the final file size to be downloaded
print(f'Variable size: {ds_multi_bbox.nbytes/1e9:.1f} GB')

#### Download as a netCDF

In [None]:
# Identify a file name for the netCDF file
file_name = 'file_name'

ds_multi_bbox.to_netcdf(f'{download_dir}/{file_name}.nc')