# CSS 120: Environmental Data Science

## XArray Split-Apply-Combine Operations to Study Terrestrial Temperature and Rainfall

### Umberto Mignozzetti (UCSD)

(Based on [Climatematch Academy](https://comptools.climatematch.io/tutorials/W1D1_ClimateSystemOverview/student/W1D1_Tutorial5.html))

# Packages

In [None]:
!pip install nc-time-axis

In [None]:
# imports
from datetime import timedelta
import numpy as np
import pandas as pd
import xarray as xr
from matplotlib import pyplot as plt
from pythia_datasets import DATASETS

##  Some Environmental Sciences

### Terrestrial Temperature and Rainfall

So far, we are averaging out our measures. This is important for understanding stability of climate processes.

However, some processes are seasonal.

1. **Temperature Variations**: As the Earth orbits the sun, its axial tilt causes different regions to receive varying amounts of solar radiation at different times of the year. 
    + This results in seasonal temperature changes, such as warmer summers and colder winters in temperate regions.

2. **Precipitation Patterns**: Many areas experience seasonal changes in precipitation.
    + Monsoon climates in South Asia and parts of Africa have distinct wet and dry seasons driven by wind patterns that change with the seasons.

3. **Plant Phenology**: The life cycles of plants, including budding, flowering, fruiting, and shedding leaves, often follow seasonal patterns that respond to changes in temperature and daylight hours.

4. **Animal Behavior**: Many animals exhibit seasonal behaviors such as migration, hibernation, and breeding, which are synchronized with environmental conditions favorable for survival or reproduction.

5. **Oceanic Processes**: Seasonal changes also affect ocean currents and marine ecosystems. 
    + Seasonal wind patterns can drive upwelling processes that bring nutrients to the ocean surface, supporting high biological productivity during certain times of the year.

Understanding seasonal processes is crucial for predicting and managing various aspects of human activity, including agriculture, water resource management, and preparation for weather-related disasters.

##  Some Environmental Sciences

### Terrestrial Temperature and Rainfall

![](https://climatereanalyzer.org/clim/animations/scycle/World_ERAI_T2_scycle.gif)

##  Some Environmental Sciences

### Land and Sea Surface Temperature Variations

Close to the equator, there is the [Intertropical Convergence Zone(ITCZ)](https://earthobservatory.nasa.gov/images/703/the-intertropical-convergence-zone), a belt of low pressure which circles the Earth near the equator where the trade winds of the Northern and Southern Hemispheres come together.

**Location:** The ITCZ is not a fixed location. Its position varies seasonally, shifting north or south with the sun's zenith, which is most directly overhead at the equator during the equinoxes and moves toward the Tropic of Cancer in June and the Tropic of Capricorn in December.

**Weather Patterns:** The convergence of northern and southern hemisphere trade winds leads to rising warm air that results in cloud formation and precipitation. This makes the ITCZ a major belt of rainfall, which is particularly pronounced over oceans.

##  Some Environmental Sciences

### Land and Sea Surface Temperature Variations

![](https://upload.wikimedia.org/wikipedia/commons/b/b7/Precipitation_longterm_mean.gif)

## Split-Apply-Combine

Simple aggregations can give useful summary of our dataset, but often we would prefer to aggregate conditionally on some coordinate labels or groups. 

Xarray provides the so-called `groupby` operation which enables the **split-apply-combine** workflow on Xarray DataArrays and Datasets. 

The split-apply-combine operation is illustrated in this figure from [Project Pythia](https://foundations.projectpythia.org/core/xarray/computation-masking.html):

- The **split** step involves breaking up and grouping an xarray Dataset or DataArray depending on the value of the specified group key.
- The **apply** step involves computing some function, usually an aggregate, transformation, or filtering, within the individual groups.
- The **combine** step merges the results of these operations into an output xarray Dataset or DataArray.

We are going to use `groupby` to remove the seasonal cycle ("climatology") from our dataset, which will allow us to better observe long-term trends in the data. 

See the [xarray `groupby` user guide](https://xarray.pydata.org/en/stable/user-guide/groupby.html) for more examples of what `groupby` can take as an input.

## Split-Apply-Combine

![](l12img01.png)

##  Split-Apply-Combine

In [None]:
filepath = DATASETS.fetch("CESM2_sst_data.nc")
ds = xr.open_dataset(filepath)
ds

##  Split-Apply-Combine

In [None]:
ds.tos.sel(
    lon=310, lat=50, method="nearest"
).plot()  # time range is 2000-01-15 to 2014-12-15

##  Split-Apply-Combine

### Split: Group Data By Month

In [None]:
# Or: ds.tos.groupby("time.month")
ds.tos.groupby(ds.time.dt.month)

##  Split-Apply-Combine

### Apply and Combine

Now that we have groups defined, it’s time to “apply” a calculation to the group. These calculations can either be:

- aggregation: reduces the size of the group
- transformation: preserves the group’s full size

At then end of the apply step, xarray will automatically combine the aggregated/transformed groups back into a single object. 

In [None]:
tos_clim = ds.tos.groupby("time.month").mean()
tos_clim

##  Split-Apply-Combine

And after computing the stat, we can plot it:

In [None]:
tos_clim.sel(lon=310, lat=50, method="nearest").plot()

##  Split-Apply-Combine

We can now add a spatial dimension to this plot and look at the zonal mean climatology (the monthly mean SST at different latitudes):

In [None]:
tos_clim.mean(dim="lon").transpose().plot.contourf(levels=12, cmap="turbo")

##  Split-Apply-Combine

Or, we can benchmark based on a given period. In this case, the difference between January and July:

In [None]:
(tos_clim.sel(month=1) - tos_clim.sel(month=7)).plot(size=6, robust=True)

##  Split-Apply-Combine

**Your turn**: What happens if you change the reference point?

In [None]:
# Your answer here

##  Questions?

##  See you next class!