# Track cyclones with TempestExtremes within JASMIN Jupyterhub
TempestExtremes is a C++ software for detecting and manipulating features in climate data.  
The Software is described two papers:
* Ullrich, P.A., C.M. Zarzycki, E.E. McClenny, M.C. Pinheiro, A.M. Stansfield and K.A. Reed (2021) "TempestExtremes v2.1: A community framework for feature detection, tracking and analysis in large datasets" Geosci. Model. Dev. 14, pp. 5023–5048, doi: 10.5194/gmd-14-5023-2021.
* Ullrich, P.A. and C.M. Zarzycki (2017) "TempestExtremes v1.0: A framework for scale-insensitive pointwise feature tracking on unstructured grids" Geosci. Model. Dev. 10, pp. 1069-1090, doi: 10.5194/gmd-10-1069-2017.

And you can find the documentation here: Please find documentation here: https://climate.ucdavis.edu/tempestextremes.php

### To do in terminal before the notebook
A. If you don't already have a dedicated environment.
1. Create a conda environment
```
conda create -n tempestextremes
conda init
bash
conda activate tempestextremes
```
2. Install kernels to work with the notebooks
```
conda install ipykernels
python -m ipykernel install --user --name=tempestextremes
conda install bash_kernel
python -m bash_kernel.install
```
3. Restart jupyterhub
4. (Still in terminal) Install TempestExtremes in the environment
```
conda init
bash
conda activate tempestextremes
conda install -c conda-forge tempest-extremes
```

B. If you already created a hackathon-specific conda environment (following e.g. https://digital-earths-global-hackathon-uk.github.io/#software-stack)
1. `conda activate hackathon`
2. `python -m ipykernel install --user --name=name-of-environment` (If not already done)
3. `conda install bash_kernel`

### This notebook
NB: Open this notebook with bash kernel. 

In [1]:
# Activate the conda environment in which tempestextremes has been install
conda activate tempestextremes
# If you have a message saying you need to initalize conda: 
## create a new cell with the `conda init` command
## Run that cell
## Delete the cell
## Restart your kernel
## Run the notebook again

To track cyclone with TempestExtremes, you need to first run `DetectNodes` to find suitable "candidate nodes" (points in space and time that could be cyclones), and second run `StitchNodes` to "Stitch" the candidate nodes into tracks. In this notebook, we show some minimal examples of how that works, and then provide the code for the UZ (Ullrich & Zarzycki, sometimes also known as the eponymous "TempestExtremes").

## Tutorial

### Minimal `DetectNodes` command

In [2]:
# Here we only find SLP minima in one file
# The file contains ERA5 SLP for one time step
DetectNodes \
--in_data "/badc/ecmwf-era5/data/oper/an_sfc/1996/11/18/ecmwf-era5_oper_an_sfc_199611180000.msl.nc" \
--out nodes.txt \
--searchbymin "msl" \
--latname "latitude" --lonname "longitude" 
# The output will summarize command arguments, and give you information about the process.

Arguments:
  --in_data <string> ["/badc/ecmwf-era5/data/oper/an_sfc/1996/11/18/ecmwf-era5_oper_an_sfc_199611180000.msl.nc"] 
  --in_data_list <string> [""] 
  --in_connect <string> [""] 
  --diag_connect <bool> [false] 
  --out <string> ["nodes.txt"] 
  --out_file_list <string> [""] 
  --searchbymin <string> ["msl"] (default PSL)
  --searchbymax <string> [""] 
  --searchbythreshold <string> [""] 
  --minlon <double> [0.000000] (degrees)
  --maxlon <double> [0.000000] (degrees)
  --minlat <double> [0.000000] (degrees)
  --maxlat <double> [0.000000] (degrees)
  --minabslat <double> [0.000000] (degrees)
  --mergedist <double> [0.000000] (degrees)
  --closedcontourcmd <string> [""] [var,delta,dist,minmaxdist;...]
  --noclosedcontourcmd <string> [""] [var,delta,dist,minmaxdist;...]
  --thresholdcmd <string> [""] [var,op,value,dist;...]
  --outputcmd <string> [""] [var,op,dist;...]
  --timestride <integer> [1] 
  --timefilter <string> [""] 
  --latname <string> ["latitude"] 
  --lonname <str

In [3]:
# Visualize nodes.txt output file
head nodes.txt
# For each time step, list of candidate nodes 
# For each candidate node, contains lon. index, lat. index, lon. value and lat. value
# Will contain more columns if more info is requested through the `outputcmd` argument

1996	11	18	8299	0
	600	0	150.000000	90.000000
	601	0	150.250000	90.000000
	602	0	150.500000	90.000000
	603	0	150.750000	90.000000
	604	0	151.000000	90.000000
	605	0	151.250000	90.000000
	606	0	151.500000	90.000000
	607	0	151.750000	90.000000
	608	0	152.000000	90.000000


In [4]:
# Clean 
rm nodes.txt

### `DetectNodes` over several files
#### Several file of the same variable at different times
If you want to search through several files at once, you need to create a text file listing the file paths, and then provide it to `--in_data_list`.

In [5]:
ls /badc/ecmwf-era5/data/oper/an_sfc/1996/11/18/*msl* > flist.txt

In [6]:
# Create folder to store nodes in
if ! [ -e nodes ]
then
    mkdir nodes
else 
    echo "nodes folder already exist, watch out for potential collisions."
fi

In [7]:
DetectNodes \
--in_data_list flist.txt \
--out nodes/ \
--searchbymin "msl" \
--latname "latitude" --lonname "longitude" 
rm log* # Comment this if you need the log files for debugging

Arguments:
  --in_data <string> [""] 
  --in_data_list <string> ["flist.txt"] 
  --in_connect <string> [""] 
  --diag_connect <bool> [false] 
  --out <string> ["nodes/"] 
  --out_file_list <string> [""] 
  --searchbymin <string> ["msl"] (default PSL)
  --searchbymax <string> [""] 
  --searchbythreshold <string> [""] 
  --minlon <double> [0.000000] (degrees)
  --maxlon <double> [0.000000] (degrees)
  --minlat <double> [0.000000] (degrees)
  --maxlat <double> [0.000000] (degrees)
  --minabslat <double> [0.000000] (degrees)
  --mergedist <double> [0.000000] (degrees)
  --closedcontourcmd <string> [""] [var,delta,dist,minmaxdist;...]
  --noclosedcontourcmd <string> [""] [var,delta,dist,minmaxdist;...]
  --thresholdcmd <string> [""] [var,op,value,dist;...]
  --outputcmd <string> [""] [var,op,dist;...]
  --timestride <integer> [1] 
  --timefilter <string> [""] 
  --latname <string> ["latitude"] 
  --lonname <string> ["longitude"] 
  --regional <bool> [false] 
  --out_header <bool> [false] 
 

In [8]:
ls nodes

000000.dat  000004.dat	000008.dat  000012.dat	000016.dat  000020.dat
000001.dat  000005.dat	000009.dat  000013.dat	000017.dat  000021.dat
000002.dat  000006.dat	000010.dat  000014.dat	000018.dat  000022.dat
000003.dat  000007.dat	000011.dat  000015.dat	000019.dat  000023.dat


In [9]:
# Each file in nodes contains the list of candidate nodes for one input file
head nodes/000000.dat

1996	11	18	8299	0
	600	0	150.000000	90.000000
	601	0	150.250000	90.000000
	602	0	150.500000	90.000000
	603	0	150.750000	90.000000
	604	0	151.000000	90.000000
	605	0	151.250000	90.000000
	606	0	151.500000	90.000000
	607	0	151.750000	90.000000
	608	0	152.000000	90.000000


In [10]:
# Clean
rm -rf nodes flist.txt

#### Several files of different variables at the same time
The input filelist must contain all files path separated by a `;`.

In [11]:
# Here we take msl, 10u and 10v files for a signle time step
msl_file="/badc/ecmwf-era5/data/oper/an_sfc/1996/11/18/ecmwf-era5_oper_an_sfc_199611180000.msl.nc"
u10_file="/badc/ecmwf-era5/data/oper/an_sfc/1996/11/18/ecmwf-era5_oper_an_sfc_199611180000.10u.nc"
v10_file="/badc/ecmwf-era5/data/oper/an_sfc/1996/11/18/ecmwf-era5_oper_an_sfc_199611180000.10v.nc"
echo ${msl_file}\;${u10_file}\;${v10_file} > flist.txt

In [12]:
# Note there is now a new lines "outputcmd" which specifies which information to add to output
DetectNodes \
--in_data_list flist.txt \
--out nodes.txt \
--searchbymin "msl" \
--outputcmd "msl,min,0;_VECMAG(u10,v10),max,2" \
--mergedist "6.0" \
--latname "latitude" --lonname "longitude" 

Arguments:
  --in_data <string> [""] 
  --in_data_list <string> ["flist.txt"] 
  --in_connect <string> [""] 
  --diag_connect <bool> [false] 
  --out <string> ["nodes.txt"] 
  --out_file_list <string> [""] 
  --searchbymin <string> ["msl"] (default PSL)
  --searchbymax <string> [""] 
  --searchbythreshold <string> [""] 
  --minlon <double> [0.000000] (degrees)
  --maxlon <double> [0.000000] (degrees)
  --minlat <double> [0.000000] (degrees)
  --maxlat <double> [0.000000] (degrees)
  --minabslat <double> [0.000000] (degrees)
  --mergedist <double> [6.000000] (degrees)
  --closedcontourcmd <string> [""] [var,delta,dist,minmaxdist;...]
  --noclosedcontourcmd <string> [""] [var,delta,dist,minmaxdist;...]
  --thresholdcmd <string> [""] [var,op,value,dist;...]
  --outputcmd <string> ["msl,min,0;_VECMAG(u10,v10),max,2"] [var,op,dist;...]
  --timestride <integer> [1] 
  --timefilter <string> [""] 
  --latname <string> ["latitude"] 
  --lonname <string> ["longitude"] 
  --regional <bool> [false

In [13]:
# Nodes file now contains two more coluns, with the SLP minimum and the 10m wind maximum
head nodes.txt

1996	11	18	907	0
	600	0	150.000000	90.000000	9.890202e+04	1.794951e+01
	601	0	150.250000	90.000000	9.890202e+04	1.794951e+01
	602	0	150.500000	90.000000	9.890202e+04	1.794951e+01
	603	0	150.750000	90.000000	9.890202e+04	1.794951e+01
	604	0	151.000000	90.000000	9.890202e+04	1.794951e+01
	605	0	151.250000	90.000000	9.890202e+04	1.794951e+01
	606	0	151.500000	90.000000	9.890202e+04	1.794951e+01
	607	0	151.750000	90.000000	9.890202e+04	1.794951e+01
	608	0	152.000000	90.000000	9.890202e+04	1.794951e+01


In [14]:
# Clean
rm nodes.txt flist.txt

#### Several files of different variables over several times

In [15]:
# Create folder to store nodes in
if ! [ -e nodes ]; then mkdir nodes; 
else echo "nodes folder already exist, watch out for potential collisions."; 
fi
# Create folder to store logs in
if ! [ -e logs ]; then mkdir logs; 
else echo "logs folder already exist, watch out for potential collisions.";
fi

In [16]:
# Loop over time (in that case hours in a day)
for h in 00 01 02 03 04
do 
    echo $h
    # Define the files for the different variables
    msl_file="/badc/ecmwf-era5/data/oper/an_sfc/1996/11/18/ecmwf-era5_oper_an_sfc_19961118${h}00.msl.nc"
    u10_file="/badc/ecmwf-era5/data/oper/an_sfc/1996/11/18/ecmwf-era5_oper_an_sfc_19961118${h}00.10u.nc"
    v10_file="/badc/ecmwf-era5/data/oper/an_sfc/1996/11/18/ecmwf-era5_oper_an_sfc_19961118${h}00.10v.nc"
    # Concatenate them into flist.txt
    echo ${msl_file}\;${u10_file}\;${v10_file} > flist.txt
    
    # Run DetectNodes for this time step
    DetectNodes \
        --in_data_list flist.txt \
        --out nodes/${h}.txt \
        --searchbymin "msl" \
        --outputcmd "msl,min,0;_VECMAG(u10,v10),max,2" \
        --mergedist "6.0" \
        --latname "latitude" --lonname "longitude" > logs/${h}.txt
done

00
01
02
03
04


In [17]:
ls nodes

00.txt	01.txt	02.txt	03.txt	04.txt


In [18]:
# Visualise file
head nodes/00.txt

1996	11	18	907	0
	600	0	150.000000	90.000000	9.890202e+04	1.794951e+01
	601	0	150.250000	90.000000	9.890202e+04	1.794951e+01
	602	0	150.500000	90.000000	9.890202e+04	1.794951e+01
	603	0	150.750000	90.000000	9.890202e+04	1.794951e+01
	604	0	151.000000	90.000000	9.890202e+04	1.794951e+01
	605	0	151.250000	90.000000	9.890202e+04	1.794951e+01
	606	0	151.500000	90.000000	9.890202e+04	1.794951e+01
	607	0	151.750000	90.000000	9.890202e+04	1.794951e+01
	608	0	152.000000	90.000000	9.890202e+04	1.794951e+01


In [19]:
# Clean
rm -rf logs flist.txt
# Do no remove nodes : We will use them next

### Minimal `StitchNodes`

In [20]:
# Create file with list of node files you want to use
ls nodes/*.txt > flist.txt

In [21]:
# Call minimal StitchNodes
StitchNodes \
--in_list flist.txt \
--in_fmt "lon,lat,slp,wind10" \
--out "tracks.csv" \
--out_file_format "csv"

Arguments:
  --in <string> [""] 
  --in_list <string> ["flist.txt"] 
  --in_connect <string> [""] 
  --out <string> ["tracks.csv"] 
  --in_fmt <string> ["lon,lat,slp,wind10"] 
  --range <double> [5.000000] (degrees)
  --mintime <string> ["3"] 
  --time_begin <string> [""] 
  --time_end <string> [""] 
  --prioritize <string> [""] 
  --min_endpoint_dist <double> [0.000000] (degrees)
  --min_path_dist <double> [0.000000] (degrees)
  --maxgap <string> ["0"] 
  --threshold <string> [""] [col,op,value,count;...]
  --caltype <string> ["standard"] (none|standard|noleap|360_day)
  --allow_repeated_times <bool> [false] 
  --add_velocity <bool> [false] 
  --out_file_format <string> ["csv"] (gfdl|csv|csvnohead)
  --out_seconds <bool> [false] 
------------------------------------------------------------
Loading candidate data
..File (1/5) "nodes/00.txt"
..File (2/5) "nodes/01.txt"
..File (3/5) "nodes/02.txt"
..File (4/5) "nodes/03.txt"
..File (5/5) "nodes/04.txt"
..Discrete times: 5 (1996-11-18 00:

In [22]:
# Visualize the tracks file
head tracks.csv
# For each track it identified, a track_id was defined
# the csv contains information about the position in space and time for each point.

track_id, year, month, day, hour, i, j, lon, lat, slp, wind10
0, 1996, 11, 18, 0, 600, 0, 150.000000, 90.000000, 9.890202e+04, 1.794951e+01
0, 1996, 11, 18, 1, 603, 0, 150.750000, 90.000000, 9.909003e+04, 1.812507e+01
0, 1996, 11, 18, 2, 604, 0, 151.000000, 90.000000, 9.922524e+04, 1.839613e+01
0, 1996, 11, 18, 3, 606, 0, 151.500000, 90.000000, 9.934372e+04, 1.837108e+01
0, 1996, 11, 18, 4, 608, 0, 152.000000, 90.000000, 9.944643e+04, 1.804239e+01
1, 1996, 11, 18, 0, 604, 0, 151.000000, 90.000000, 9.890202e+04, 1.794951e+01
1, 1996, 11, 18, 1, 604, 0, 151.000000, 90.000000, 9.909003e+04, 1.812507e+01
1, 1996, 11, 18, 2, 604, 0, 151.000000, 90.000000, 9.922524e+04, 1.839613e+01
2, 1996, 11, 18, 0, 605, 0, 151.250000, 90.000000, 9.890202e+04, 1.794951e+01
