<a href="https://colab.research.google.com/github/trchudley/GEOG2462/blob/main/Week_4_Climate_Data/1_Download_Climate_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Download and visualise ERA5 climate data

## Import GEE and relevant tools

As usual, we begin by importing the relevant packages and initialising our connection to Google Earth Engine:

In [None]:
import ee
import geemap
import pandas as pd
import matplotlib.pyplot as plt
from google.colab import drive

ee.Authenticate()  # Trigger the authentication flow.
ee.Initialize(project='ee-trchudley')    # Change to your own default project name.

## ERA5 data

This week, we will be doing something slightly different. To supplement our Landsat observational data, we will be downloading a time-series of monthly climate data. The climate dataset we will be using is ERA5: the fifth generation atmospheric reanalysis of the global climate by the European Centre for Medium-Range Weather Forecasts (ECMWF). This is a _reanalysis_ dataset, combining contemporaneous observations and numerical modelling to provide a historical set of 'maps without gaps' from January 1940 to present. If you haven't encountered reanalysis data before, I would recommend reading the following short websites (including a short video!) to get a handle on things:

 - https://climate.copernicus.eu/climate-reanalysis
 - https://climatedataguide.ucar.edu/climate-data/atmospheric-reanalysis-overview-comparison-tables

It is worth keeping in mind the pros and cons of using reanalysis data. It is useful as it can provide a known dataset over any region of interest you like, even where observational data is poor. However, the further you get from observational data, the more your are relying on model accuracy. Global reanalyses are often run on the scale of tens of kilometres, meaning that it will not properly account for local topography and other fine-scale aspects of weather and climate. Generally, finding local observational data or high-resolution model runs might be prefereable for local studies; however, this approach will work for now as it will provide a consistent baseline dataset for our projects. If you can easily find local data for your project though, all the better!

We will be using the [ERA5-Land](https://www.ecmwf.int/en/era5-land) monthly aggregated dataset, which has a spatial resolution of 9 km. It is handily [also provided by Google Earth Engine](https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_LAND_MONTHLY_AGGR), meaning we can use the same principles and interface to interact with the data as we have already been doing.

A quick example from the Earth Engine docucmentation shows how we can visualise the data (note the spatial resolution) - however, for the rest of the notebook we will focus on extracting time-series at a single point.


In [None]:
dataset = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR').first();

visualization = {
  'bands': ['temperature_2m'], 'min': 250, 'max': 320,
  'palette': [
    '000080', '0000d9', '4000ff', '8000ff', '0080ff', '00ffff',
    '00ff80', '80ff00', 'daff00', 'ffff00', 'fff500', 'ffda00',
    'ffb000', 'ffa400', 'ff4f00', 'ff2500', 'ff0a00', 'ff00ff',
  ]
};

Map = geemap.Map()  # Create empty map
Map.setCenter(70, 45, 3);
Map.addLayer(
    dataset, visualization, 'Air temperature [K] at 2m height', True, 0.8
    )
Map

You might have noticed that we needed to scale the temperature data between 250 and 320! This is because the temperature data is provided in Kelvin (0 K = −273.15 °C).

## Download ERA5 time series at a point

We will start in a familiar way: by setting a desired location in latitude and longitude. For now, we'll use the same location as last week, in case comparing time-series reveals something interesting...

In [None]:
# Location - editable
latitude = 41.017           # Degrees of latitude
longitude = -123.611        # Degrees of longitude
location_name = 'hoopa'     # recognisable name, to create a useful file name

# Create a point geometry at the specified location
point = ee.Geometry.Point(longitude, latitude)


We will also have to decide what variables we are interested in. For now, I will select only the 2 m air temperature (generally considered the 'surface' air temperature in meteorology), and the total sum precipitation over the month. There are an inordinate amount of options you could add to this list if you would like (e.g. for doing an NDSI project, it might be useful to have snowfall...). A complete catalogue of variables, including useful descriptions of what the data actually represents (units, meanings, etc) can be found on the [Earth Engine documentation](https://developers.google.com/earth-engine/datasets/catalog/ECMWF_ERA5_LAND_MONTHLY_AGGR#bands).

> **Tasks:**
> 1. Take a look at the documentation linked above and read the 'temperature_2m' and 'total_precipitation_sum' (search the page using `ctrl+f` to save yourself time). What are the weaknesses in using a 9 km resolution dataset for this assessment?
> 2. Are there any other variables that might be useful for the project you have in mind?

In [None]:
# Names of the variables we want to download -- add more if you like!
variables = [
    'temperature_2m',
    'total_precipitation_sum',
    ]

Now that we have our variables, we can search through the dataset in a way that might look vaguely familiar. We are once again selecting an image collection (this time, the ERA5 option), and filtering to chosen dates and variables. Note that, even though this is a modelling product, Earth Engine still stores and thinks of the dataset as an image (hence, the documentation above referring to 'bands' rather than 'variables'). This is quite a handy way to take advantage of Earth Engine, which is nominally for remote sensing data and image analysis, for modelling purposes! To make things easy, I haven't given you an option for dates - I've set (or 'hard-coded') it to return a dataset from 2013 - 2024, seeing as that is our period of interest over the Landsat 8 record.

Finally, a function 'maps' through the dataset, using a `reduceRegion` function to find the variable values at the pixel that overlaps our study point. It sets these values (along with the year and date) as properties of the image in the image collection - for example, as well as a metadata number for 'cloud cover', it will now have one for temperature at our point.

In [None]:

# Define the dataset as the ERA5 monthly mean data
dataset = ee.ImageCollection('ECMWF/ERA5_LAND/MONTHLY_AGGR')

# Filter the dataset to the desired time range
dataset = dataset.filter(ee.Filter.date('2013-01-01', '2023-12-31'))

# Select only the relevant variables
dataset = dataset.select(variables)

# Go through the collection, finding the nearest values to the set lat/lon
dataset = dataset.map(
    lambda image: image.set(
        'era5',
        image.reduceRegion(
            reducer=ee.Reducer.first(),
            geometry=point,
            scale=image.projection().nominalScale()  # Use native resolution
        ).set('year', image.date().get('year')).set('month', image.date().get('month'))
    )
)


We can now extract this data and turn it into a `Pandas` dataframe/table (recall last week). Don't worry too much about what is going on in this cell - just note that once you run it, it produces a nice and sensible table with all of our data!

In [None]:
# Get this data as a dictionary, and turn it into a Pandas dataframe (table)
era5_dict = dataset.aggregate_array('era5').getInfo()
era5 = pd.DataFrame(era5_dict)

# Construct a proper 'date' column from the year and month data.
dates = pd.to_datetime(dict(year=era5.year, month=era5.month, day=15))

# Drop the 'year' and 'month' columns and insert the 'date' column instead as the first column.
era5.drop(labels=['year', 'month'], axis=1,inplace = True)
era5.insert(0, 'date', dates)

# Visualise table
era5

We're going to do some things to make things human-readable. First, we'll convert the temperature from Kelvin to Celcius by subtracting 274.15 degrees. Second, we will convert the total monthly rainfall from metres to mm by multiplying by 1000.

If you've included any additional variables, this is the point where you might want to introduce some corrections of your own, after consulting with the documentation.

In [None]:
# Create new 'corrected' table as a copy of the first.
era5_corr = era5.copy()

# Transition temperature from Kelvin to Celcius
era5_corr['temperature_2m'] = era5_corr['temperature_2m'] - 274.15

# Transition total rainfall from m to mm
era5_corr['total_precipitation_sum'] = era5_corr['total_precipitation_sum'] * 1000

# Visualise new table
era5_corr

Let's now use the `matplotlib` package to visualise this data. As with last week, don't worry too much about what's going on here (unless you want to!) - we'll export the data to your Google Drive so you can create figures and perform analyses using your preferred software (Excel, etc.).

In [None]:
fig, axes = plt.subplots(nrows=2, layout='constrained', figsize=(8, 5))

# For the first axis...
ax = axes[0]
# Plot the temperature in red on the first axis
era5_corr.plot('date', 'temperature_2m', ax=ax, color='tab:red', legend=False)
# Set the y axis label
ax.set_ylabel('2 m Temperature [˚C]')

# For the second axis...
ax = axes[1]
# Plot the total monthly precipitaiton in blue on the second axis
era5_corr.plot('date', 'total_precipitation_sum', ax=ax, color='tab:blue', legend=False)
# Set the y axis label ('\n' forces a line break to fit it on the axis)
ax.set_ylabel('Total Monthly\nPrecipitation [mm]')
# Set the y lower limit to zero
ax.set_ylim(0, None)

for ax in axes:  # Loop through both axes, applying the same rules to both

  # Apply a grid
  ax.grid()

  # Set the x axis limits between 2013 and 2024
  ax.set_xlim(pd.to_datetime('2013-01-01'), pd.to_datetime('2024-01-01'))

  # Set the x axis label to 'Date'
  ax.set_xlabel('Date')


Regardless of what software you choose to plot the data, it's clear we've got a good dataset to analyse and find potential explanations for the trends and events we've observed in our remote sensing data!

Last week, we saw potential high wildfire years in 2020 and 2023. It's interesting to compare this to our data here. Neuther of those years seem to have particularly high mean monthly temperatures through the summer. However, there is quite low precipitation in the 2020-2023 period. A look at the literature may reveal studies that investigate which are stronger controls - and perhaps the solution might lie in other variables. For instance, the Google Earth Engine documentation shows there is a `temperature_2m_max` variable. Maximum temperature might be more important control than mean temperature for initiating wildfires.

## Export data to Google Drive

We can export the data to our Google Drive from Colaboratory in the same way as last week. First, we must 'mount' our Google Drive onto Colab:

In [None]:
drive.mount('/content/drive')

Then, we can select what top-level folder within our Google Drive we would like to store the csv in:

In [None]:
# You can edit this variable
folder = 'scires_project_2A'

Then, we automatically construct a file name based on the location name from earlier, and export a `csv` file there. This csv file can be opened in other software for analysis.

In [None]:
# Construct the filename automatically
filename = location_name + '_era5.csv'

# Print out filename for reference
print("The image will be saved to your Google Drive at:\n" + folder + '/' + filename + '\n')

# Export the pandas dataframe to a csv file
era5_corr.to_csv(f'/content/drive/My Drive/{folder}/{filename}', index=False)

print('Saved.')

Once again, this will now be in your Google Drive folder of choice. You can download, visualise, and perform further analysis to your liking. You may wish to compare extremes or changes in NDI identified from approaches outlines in Weeks 2-3 to extremes or changes visible in the climate data.