# Flux Data Demo

This little demo uses `terndata.flux` library to access Flux Data directly from TERN Data services.

* Documentation: https://terndata-flux.readthedocs.io
* GitHub: https://github.com/ternaustralia/terndata.flux/
* PyPi: https://pypi.org/project/terndata.flux/

## Data links

* Metadata: https://portal.tern.org.au/results?theme=flux
* Thredds: https://dap.tern.org.au/thredds/catalog/ecosystem_process/ozflux/catalog.html
* Download: https://data.tern.org.au/EcosystemProcesses/ozflux/

⚠️ Important: To access data programmatically, you need to create an TERN API key: https://account.tern.org.au/

Once you created an api key, create a .netrc file in your home directory (cd ~, vim .netrc), then copy the following into the file, replacing YOUR_REAL_API_KEY with your key copied from https://account.tern.org.au/

    machine data.tern.org.au
        login apikey
        password YOUR_REAL_API_KEY

In [16]:
# set to true to enable hvplots (disabled to keep .ipynb file size small)
USE_HV_PLOT = False
import terndata.flux as flux
if USE_HV_PLOT:
    import hvplot.xarray
import xarray as xr
import matplotlib.pyplot as plt

# Discovery

The library offers a few helper methods to discover available data sets.

In [2]:
# get all available sites and show on map
sites = flux.get_sites().set_crs("EPSG:4326")
sites

Unnamed: 0,site,longitude,latitude,vegetation,canopy height,start,end,geometry
0,AdelaideRiver,131.1178,-13.0769,Woody Savanna,16.4m,2007-10-17 11:30:00,2009-05-24 06:00:00,POINT (131.1178 -13.0769)
1,AliceSpringsMulga,133.249,-22.283,Canopy + three types of understorey (vegetatio...,6.5m,2010-09-03 00:00:00,2025-01-28 10:00:00,POINT (133.249 -22.283)
2,AlpinePeatland,147.320833,-36.862222,,0.3m,2017-04-12 18:30:00,2022-06-20 23:30:00,POINT (147.32083 -36.86222)
3,Boyagin,116.938559,-32.477093,Dry schlerophyll woodland,13.0m,2017-10-20 13:00:00,2025-02-02 00:00:00,POINT (116.93856 -32.47709)
4,CalperumChowilla,140.5877,-34.0027,Mallee,3m,2010-07-30 11:00:00,2025-02-02 00:00:00,POINT (140.5877 -34.0027)
5,CapeTribulation,145.446922,-16.103219,lowland complex mesophyll vine forest,25m,2010-01-01 00:00:00,2018-11-02 12:00:00,POINT (145.44692 -16.10322)
6,Collie,116.237,-33.42,Dry schlerophyll woodland,12.0m,2017-08-04 00:30:00,2019-11-11 13:30:00,POINT (116.237 -33.42)
7,CowBay,145.42715,-16.238189,lowland complex mesophyll vine forest,26m,2009-01-01 09:00:00,2025-02-02 00:00:00,POINT (145.42715 -16.23819)
8,CumberlandPlain,150.723611,-33.615278,Dry sclerophyll forest,24 m,2014-01-01 00:30:00,2025-01-01 00:00:00,POINT (150.72361 -33.61528)
9,DalyPasture,131.3181,-14.0633,Tropical improved pasture,0.5m,2008-01-01 00:00:00,2013-09-08 13:30:00,POINT (131.3181 -14.0633)


In [10]:
# Filter the list of sites by name and plot on map
my_sites = ["AliceSpringsMulga", "CalperumChowilla", "HowardSprings"]
sites[sites.site.isin(my_sites)].explore(
    marker_kwds=dict(radius=5, fill=True)
)

# Versioning

Let's pick one site, and find a few more details about available data.

Flux Data is released quarterly, and usually fully reprocessed. The version number indicate the quarterly release date.

Older versions are kept for reproducibility.

In [11]:
# Which versions are available for Howard Springs ?
my_site = "HowardSprings"
versions = flux.get_versions(my_site)
versions

['2020',
 '2021_v1',
 '2022_v1',
 '2022_v2',
 '2023_v1',
 '2023_v2',
 '2024_v1',
 '2024_v2',
 '2025_v1']

# Processing Levels

Flux data is published at various processing levels. 

* L3: extensive quality control and formatting
* L4: gap filling for meteorological data using external sources
* L5: gap filling for flux data using artificial neural networks
* L6: partitioning gap-filled data into Gross Primary Productivity (GPP) and Ecosystem Respiration (ER)

In [12]:
# pick latest and find available process
version = versions[-1]
p_levels = flux.get_processing_levels(my_site, version)
p_levels

['L3', 'L4', 'L5', 'L6']

# Access Data

The `flux.get_dataset` method loads data for a specific site, version and processing level into an xarray.

The returned xarray provides all global and per variable metadata for the selected dataset.

In [13]:
# pick greatest processing level (L6)
p_level = p_levels[-1]
ds = flux.get_dataset(my_site, version, p_level)
ds

# Drop some dimensions

The data is stored in a CF compliant data format, and has a latitude and longitude dimension. However a flux tower is stationary and has only a single value for location. We can drop these extra dimension to make working with the data a bit easier. (No need to address coordinates all the time).

In [14]:
# remove single value dimensions
ds = ds.squeeze(drop=True)
ds

# Absolute Humidity

Let's filter out some variables we may be interested in.

In [15]:
# ahs = ds.filter_by_attrs(long_name="Absolute humidity")
ahs = ds.filter_by_attrs(standard_name="mass_concentration_of_water_vapor_in_air")
ahs

In [17]:
# remove all values > 0 and average daily
ahs = ahs.where(ahs > 0).resample(time="1D").mean()

In [18]:
with plt.ioff():
    ahs.to_array().plot.line(figsize=(10,5), hue="variable")
    plt.savefig("images/flux_ahs.png");
    plt.close()

![AHS Plot](./images/flux_ahs.png)

In [19]:
# interactive hvplot
if USE_HV_PLOT:
    ahs.hvplot(width=1000, height=600)

# Air Temperature

In [20]:
# Temperature
tas = ds.filter_by_attrs(standard_name="air_temperature")
tas = tas.where(tas > -20).resample(time="1D").mean()

In [21]:
with plt.ioff():
    tas.to_array().plot.line(figsize=(10,5), hue="variable")
    plt.savefig("images/flux_tas.png");
    plt.close()

![Ta Plot](./images/flux_tas.png)

In [22]:
# interactive hvplot
if USE_HV_PLOT:
    tas.hvplot(width=1000, height=600)

# Gross Primary Productivity

In [23]:
gpp = ds.filter_by_attrs(long_name="Gross Primary Productivity")
gpp = gpp.where(gpp >= 0).resample(time="1D").mean()

In [24]:
with plt.ioff():
    gpp.to_array().plot.line(figsize=(10,5), hue="variable")
    plt.savefig("images/flux_gpp.png");
    plt.close()

![GPP Plot](./images/flux_gpp.png)

In [25]:
# interactive hvplot
if USE_HV_PLOT:
    gpp.hvplot(width=1000, height=600)

# Compare across sites

We can also load data from multiple sites in one go and compare similar variables.

One has to be careful though, as not every site has data published for the same version.

In [26]:
# Do this across sites
dss = flux.get_datasets(my_sites, version)
# get air temperature for all sites 
dss = {key: value.Ta.squeeze(drop=True) for key, value in dss.items()}
dss

{'AliceSpringsMulga': <xarray.DataArray 'Ta' (time: 252549)> Size: 2MB
 [252549 values with dtype=float64]
 Coordinates:
   * time     (time) datetime64[ns] 2MB 2010-09-03 ... 2025-01-28T10:00:00
 Attributes:
     coverage_L3:     87%
     height:          11.6m
     instrument:      HMP45c
     long_name:       Air temperature
     standard_name:   air_temperature
     statistic_type:  average
     units:           degC
     valid_range:     [-5. 50.],
 'CalperumChowilla': <xarray.DataArray 'Ta' (time: 255810)> Size: 2MB
 [255810 values with dtype=float64]
 Coordinates:
   * time     (time) datetime64[ns] 2MB 2010-07-30T11:00:00 ... 2025-03-02T19:...
 Attributes:
     coverage_L3:     97%
     height:          2m
     instrument:      HMP45C,HMP60
     long_name:       Air temperature
     standard_name:   air_temperature
     statistic_type:  average
     units:           degC
     valid_range:     [ 0. 50.],
 'HowardSprings': <xarray.DataArray 'Ta' (time: 406314)> Size: 3MB
 [406314

In [27]:
# filter all variables by valid value range (remove outliers)
dss = {key: value.where((value > value.attrs["valid_range"][0]) & (value < value.attrs["valid_range"][1])) for key,value in dss.items()}

In [28]:
# create an xarray dataset combining all datasets
da = xr.Dataset(dss)
da

In [29]:
# plot Air temperature for all sites monthly averagre
with plt.ioff():
    da.resample(time="1ME").mean().to_array().plot.line(figsize=(10,5), hue="variable")
    plt.savefig("images/flux_tac.png");
    plt.close()

![Ta comparison](images/flux_tac.png)

In [30]:
# plot Air temperature for all sites monthly average - interactive hvplot
if USE_HV_PLOT:
    da.resample(time="1ME").mean().hvplot(width=1000, height=600)

In [31]:
# slice dataset over given time area, and plot daily averages
with plt.ioff():
    da.sel(time=slice("2020-01-01", "2020-12-12")).resample(time="1D").mean().to_array().plot.line(figsize=(10,5), hue="variable")
    plt.savefig("images/flux_tac_2020.png");
    plt.close()

![Ta 2020 comparison](images/flux_tac_2020.png)

In [32]:
# slice dataset over given time area, and plot daily averages - interactive hvplot
if USE_HV_PLOT:
    da.sel(time=slice("2020-01-01", "2020-12-12")).resample(time="1D").mean().hvplot(width=1000, height=600)