# Exercise 1a: Introduction to xarray


## Aim: Learn about what xarray is and how to create and look at a `DataArray`.

Find the teaching resources here: https://tutorial.xarray.dev/fundamentals/01_data_structures.html and https://tutorial.xarray.dev/fundamentals/01_datastructures.html.

### Issues Covered:
- Importing `xarray`
- Loading a dataset using `xr.open_dataset()`
- Creating a `DataArray`

## 1. Introduction to multidimensional arrays

- Unlabelled N dimensional arrays of numbers are the most widely used data structure in scientific computing
- These arrays lack meaningful metadata so users must track indices in an arbitrary fashion

<img src="../images/multidimensional_array.png" width="800"/>

Q1. Can you think of any reasons why xarray might be preferred to pandas when working with multi-dimensional data like climate models?
(Hint: how many dimensions does a pandas dataframe have?)

In [1]:
# xarray is designed to handle data with multiple dimensions
# pandas is defined for 1D (series) and 2D (dataframe) structures.
# xarray allows you to work with labelled dimensions and coorfinates.
# pandas only offers labels through 'MultiIndex'
# xarray is built on netcdf model and understands CF conventions
# panfas doesn't natively support netcdf or cf conventions
# xarray supports metadata attached to datasets
# pandas metadata support is minimal

## 2. Opening and Exploring Datasets

Q2. Open the `'../data/xbhubo.pgc0apr.nc'` dataset and load it into an xarray `Dataset` called `ds`.
(Hint: Don't forget to import any packages you need).
This file is a model run for HadCM3 run as part of the RAPID study: https://catalogue.ceda.ac.uk/uuid/6bbab8394124b252f8b1b036f9eb6b6b/

In [2]:
import xarray as xr
ds = xr.open_dataset('../data/xbhubo.pgc0apr.nc')

Q3. Look at the parameters of the dataset.

In [3]:
ds

Q4. What are the dimensions and variables in this dataset? What does each represent? 

In [4]:
# There are four data variables, temp, salinity, ucurr and vcurr.
# t (time) and depth are dimensions used by all of these variables.
# temp and salinity have a a different grid to ucurr and vcurr. This is why there is longitude and longitude_1, which are dimensions which refer to differnt variables.

Q5. Find the name of the temperature data variable, and use it to extract a `DataArray` called `temperature`.

In [5]:
# temp = sea surface temperature.
temperature = ds["temp"]

Q6. Take a look at the `temperature` data array and inspect its dimensions, coordinates and attributes. What are the specific dimensions and coordinates associated with it? What metadata (attributes) is provided?

In [6]:
# It is a 3D array with dimensions (time, lat, lon)
temperature

Q7. Find out what dimensions and coordinates exist in your dataset. Which latitude and longitude variables are associated with the ocean temperature variable?

In [7]:
ds.coords

Coordinates:
  * longitude    (longitude) float32 1kB 0.0 1.25 2.5 3.75 ... 356.2 357.5 358.8
  * latitude     (latitude) float32 576B -89.38 -88.12 -86.88 ... 88.12 89.38
  * depth        (depth) float32 80B 5.0 15.0 25.0 ... 4.577e+03 5.192e+03
  * t            (t) object 8B 1920-04-16 00:00:00
  * longitude_1  (longitude_1) float32 1kB 0.625 1.875 3.125 ... 358.1 359.4
  * latitude_1   (latitude_1) float32 572B -88.75 -87.5 -86.25 ... 87.5 88.75

In [8]:
temperature.dims

('t', 'depth', 'latitude', 'longitude')

The latitude and longitude coordinates (rather than those with _1) are associated with temperature.