# CSS 120: Environmental Data Science

## Long-term climate processes, Xarray operations, and anomalies

### Umberto Mignozzetti

(Based on [Climatematch Academy](https://comptools.climatematch.io/tutorials/W1D1_ClimateSystemOverview/student/W1D1_Tutorial4.html))

# Packages

In [None]:
# imports
from datetime import timedelta
import numpy as np
import pandas as pd
import xarray as xr
from matplotlib import pyplot as plt
from pythia_datasets import DATASETS

##  Some Environmental Sciences

### Milankovich Orbital Cycles

To be able to understand our impact in the planet's temperature, we need to have a benchmark.

Usually we use the long term temperature variations to certify that some change is anomalous or not.

But the environment in the long run has many different forcings.

One important process here are the orbital cycles.

##  Some Environmental Sciences

### Milankovich Orbital Cycles

![](l13img01.jpg)

##  Some Environmental Sciences

### Milankovich Orbital Cycles

![](l13img02.jpg)

##  Some Environmental Sciences

### Milankovich Orbital Cycles

![](l13img03.png)

##  Some Environmental Sciences

### Milankovich Orbital Cycles

![](l13img04.png)

##  Some Environmental Sciences

### Milankovich Orbital Cycles

![](l13img05.png)

# Compute Anomaly

First, let us load the data:

In [None]:
filepath = DATASETS.fetch("CESM2_sst_data.nc")
ds = xr.open_dataset(filepath)
ds

# Compute Anomaly

Let us now:

1. Split the dataset by month
2. Compute monthly averages
3. Take the difference to look at deviations from the monthly averages

This will show us the temperature anomalies.

# Compute Anomaly

Steps (1) to (3):

In [None]:
# group all data by month
gb = ds.tos.groupby("time.month")

# take the mean over time to get monthly averages
tos_clim = gb.mean(dim = "time")

# subtract this mean from all data of the same month
tos_anom = gb - tos_clim

# Compute Anomaly

Results:

In [None]:
tos_anom

# Compute Anomaly

A plot may be more helpful in here. For a given location, the plot would be:

In [None]:
tos_anom.sel(
    lon = 310, 
    lat = 50, 
    method = "nearest"
).plot()

# Compute Anomaly

**Your turn**: Change the location and check. Can you pick a place closer to San Diego?

In [None]:
## Answers here

# Compute Anomaly

How about the global anomaly? We can compute that by aggregating on `lat` and `lon`:

In [None]:
unweighted_mean_global_anom = tos_anom.mean(
    dim=["lat", "lon"]
)
unweighted_mean_global_anom.plot()

# Compute Anomaly

However, this result does not take into account that grid cells vary in size.

For instance, take a look at the grid below:

![](l13img06.png)

Closer to the equator, the grids are larger. So, we need to reweight. That's why the name *unweighted* above.

# Compute Anomaly

`Xarray` [`.weighted()`](https://xarray.pydata.org/en/stable/user-guide/computation.html#weighted-array-reductions) method helps with this.

First, let us load the grid cell area data:

In [None]:
filepath2 = DATASETS.fetch("CESM2_grid_variables.nc")
areacello = xr.open_dataset(filepath2).areacello
areacello

# Compute Anomaly

Then, let us compute the weighted mean using the `areacello` data:

In [None]:
weighted_mean_global_anom = tos_anom.weighted(
    areacello
).mean(
    dim = ["lat", "lon"]
)

# Compute Anomaly

Now we can compute the weighted and the unweighted mean, and see their differences:

In [None]:
unweighted_mean_global_anom.plot(size=7)
weighted_mean_global_anom.plot()
plt.legend(["unweighted", "weighted"])

# Compute Anomaly

**Your turn**: Why do the calculations change when weighting by area?

*Hint*: Think about how area, lat, and lon correlate.

## Questions?

## See you next week!