## Accessing, Analysing, and Visualising Above-Ground Biomass Data from TERN via GeoJSON API

Welcome to this comprehensive tutorial on utilising TERN's API to access above-ground biomass data in GeoJSON format. This data, encompassing spatial structures from distinct sites, provides valuable insights into various geographic features. By the conclusion of this guide, you will have gained expertise in fetching, visualising, and performing spatial exploratory data analysis (EDA) on these vital datasets.


<div style="border:3px solid #d3d3d3; padding: 15px; margin: 15px 0;">

**Note:** The interactive maps in this notebook are powered by JavaScript and may not be displayed when viewed directly on GitHub. Where the map should be rendered, the output "Make this Notebook Trusted to load map: File -> Trust Notebook" may be seen. However, to view the interactive elements, please run this notebook in a Jupyter environment with JavaScript support.

</div>

### What is GeoJSON?

GeoJSON is a format that neatly organises geographical features in a way that is both easy to understand and does not take up much space, making it easy to share and use. It is commonly used in web applications for transferring geographical data, such as the locations of plants, animals, or other points of interest.

**Why is GeoJSON important for ecologists?**

1. **Standardised Format**: GeoJSON is a standardised format. This means that if you have geographical data from one research project and another set of data from a different project or institution, both can be presented in GeoJSON format and easily combined or compared.
2. **Interoperability**: Given its widespread acceptance, many geospatial tools and software support GeoJSON. This ensures that data can be imported, visualised, and analysed without any format conversion hassles.
3. **Web Friendly**: Its foundation on JSON makes it inherently suitable for web applications. This is particularly valuable for ecologists looking to share or visualise their findings online.
4. **Rich Feature Set**: GeoJSON is not just about points on a map. It can represent more complex features like polygons (e.g., areas of forest) or lines (e.g., migration paths).


### Obtaining an API Key

Before fetching data from TERN's API, you'll need an API key. Instructions on how to obtain this key can be found at [this link](https://ternaus.atlassian.net/wiki/spaces/TERNSup/pages/2353496065/Create+and+Use+API+Key+to+Access+TERN+Data+Services).


Additionally, in our data retrieval, the feature type we're targeting is "plant individual." This helps us fetch specific data related to individual plants from the TERN database.


We will be exploring the data from two sites:



#### Daintree Rainforest, Cow Bay
**Description:** The dataset from Daintree Rainforest, Cow Bay, offers stem diameter, height measurements, and above-ground living biomass calculations for an Australian tropical rainforest. Both diameter and height measurements for stems ≥10cm diameter at breast height were sampled within a 1 ha plot in 2012 and 2018. 

**Purpose:** This data plays a pivotal role in mapping and monitoring alterations in plant growth, carbon storage, and terrestrial energy fluxes.  

**Lineage:** The 1 ha plot was sectioned into 25 subplots (20 x 20 m). Each individual stem ≥ 10 cm diameter at breast height was mapped within each subplot. Above-ground biomass was ascertained using the allometric equation from Chave et al. 2014.

### Samford Peri-Urban
**Description:** This dataset reveals stem diameter, height measurement, and above-ground living biomass calculations for an open Eucalypt and notophyll vine forest within a 1 ha plot at the Samford Peri-Urban site.  

**Purpose:** Analogous to the Daintree site, this data is invaluable for grasping changes in plant growth, carbon retention, and terrestrial energy fluxes.  

**Lineage:** All stems ≥ 10 cm diameter at breast height were mapped within each 20 x 20 m subplot. Heights for discernible and larger stems were gauged using a Nikon laser range finder. Biomass was determined via the allometric equation from Chave et al. 2014.
To get started, we will access the data straight from TERN's API. Let's run the following code to import the requisite libraries and retrieve the data:


**Biomass Calculation Methodology**:

For stems with a diameter greater than 10 cm at breast height, the above-ground biomass was determined using a specialised formula, as outlined by Chave et al.,2014:

> Reference: Chave, J. et al. (2014). Improved allometric models to estimate the aboveground biomass of tropical trees. Glob Change Biol, 20: 3177-3190. [Link to study](https://doi.org/10.1111/gcb.12629).



In [1]:
import http.client
import json
import pandas as pd
import geopandas as gpd
from shapely.geometry import shape

# Setting up the connection and headers
conn = http.client.HTTPSConnection("ecoplots-test.tern.org.au")
headers = {
  'X-Api-Key': 'api_key',
  'Content-Type': 'application/json'
}

# Defining the payload for the query
payload = json.dumps({
  "query": {
    "dataset": [
      "http://linked.data.gov.au/dataset/tern-ecosystem-processes"
    ],
    "observed_property": [
      "http://linked.data.gov.au/def/tern-cv/c3d26c6f-91b7-4627-91e6-2147fa44ad03"
    ],
    "feature_type": [
      "http://linked.data.gov.au/def/tern-cv/60d7edf8-98c6-43e9-841c-e176c334d270"
    ],
    "site_id": [
      "https://w3id.org/tern/resources/5febb758-e33e-43dd-985e-11802fd7ab42",
      "https://w3id.org/tern/resources/48396239-7335-422b-9f13-a31054306d71"
    ]
  }
})

# Making the request to the API
conn.request("POST", "/api/v1.0/data/tern-ecosystem-processes?dformat=geojson", payload, headers)
res = conn.getresponse()
data_geojson = json.loads(res.read().decode("utf-8"))

# Converting the GeoJSON to a Geopandas DataFrame
gdf = gpd.GeoDataFrame.from_features(data_geojson['features'])
gdf.head()


Unnamed: 0,geometry,dataset,site,siteVisit,observations
0,POINT (145.42934 -16.23772),{'dataset.title': 'TERN Ecosystem Processes'},"{'siteName': 'Daintree Rainforest, Cow Bay, co...","{'siteVisitName': '20120607', 'siteVisitDate':...",[{'featureId': 'http://linked.data.gov.au/data...
1,POINT (145.42934 -16.23772),{'dataset.title': 'TERN Ecosystem Processes'},"{'siteName': 'Daintree Rainforest, Cow Bay, co...","{'siteVisitName': '20181119', 'siteVisitDate':...",[{'featureId': 'http://linked.data.gov.au/data...
2,POINT (152.88080 -27.38890),{'dataset.title': 'TERN Ecosystem Processes'},"{'siteName': 'Samford Peri-urban, core1ha', 'p...","{'siteVisitName': '20120911', 'siteVisitDate':...",[{'featureId': 'http://linked.data.gov.au/data...
3,POINT (152.88080 -27.38890),{'dataset.title': 'TERN Ecosystem Processes'},"{'siteName': 'Samford Peri-urban, core1ha', 'p...","{'siteVisitName': '20170901', 'siteVisitDate':...",[{'featureId': 'http://linked.data.gov.au/data...


In [3]:
# Fetching, Processing, and Displaying TERN GeoJSON Data from API

# Importing the necessary libraries
import http.client
import json
import pandas as pd
import geopandas as gpd
from shapely.geometry import shape

# Setting up the connection and headers to the TERN API
conn = http.client.HTTPSConnection("ecoplots-test.tern.org.au")
headers = {
  'X-Api-Key': 'OUdmTExoNnhWTzBidnE1Yi5tKk9YdWhRdyQzDHwJXEhUMShvZ2AlIHY/LAlXIFM4QVdJICE0a257aVRhPXBEOHZMeFw+PFpCL008OWl8NnhE',  # Remember to replace 'YOUR_API_KEY' with your actual API key
  'Content-Type': 'application/json'
}

# Defining the payload for the query
payload = json.dumps({
  "query": {
    "dataset": [
      "http://linked.data.gov.au/dataset/tern-ecosystem-processes"
    ],
    "observed_property": [
      "http://linked.data.gov.au/def/tern-cv/c3d26c6f-91b7-4627-91e6-2147fa44ad03"
    ],
    "feature_type": [
      "http://linked.data.gov.au/def/tern-cv/60d7edf8-98c6-43e9-841c-e176c334d270"
    ],
    "site_id": [
      "https://w3id.org/tern/resources/5febb758-e33e-43dd-985e-11802fd7ab42",
      "https://w3id.org/tern/resources/48396239-7335-422b-9f13-a31054306d71"
    ]
  }
})

# Making the request to the TERN API
conn.request("POST", "/api/v1.0/data/tern-ecosystem-processes?dformat=geojson", payload, headers)
res = conn.getresponse()
data_geojson = json.loads(res.read().decode("utf-8"))

# Once we've fetched the data, we'll convert the GeoJSON data to a Geopandas DataFrame
tern_data = gpd.GeoDataFrame.from_features(data_geojson['features'])

# Extracting the specific information from the DataFrame
tern_data['siteName'] = tern_data['site'].apply(lambda x: x.get('siteName'))
tern_data['siteVisitName'] = tern_data['siteVisit'].apply(lambda x: x.get('siteVisitName'))
tern_data['Latitude'] = tern_data['geometry'].apply(lambda x: x.y)
tern_data['Longitude'] = tern_data['geometry'].apply(lambda x: x.x)

# Displaying the selected columns to view the processed data
tern_data_display = tern_data[['siteName', 'siteVisitName', 'Latitude', 'Longitude']]
print(tern_data_display.head())



                                siteName siteVisitName   Latitude   Longitude
0  Daintree Rainforest, Cow Bay, core1ha      20120607 -16.237715  145.429343
1  Daintree Rainforest, Cow Bay, core1ha      20181119 -16.237715  145.429343
2            Samford Peri-urban, core1ha      20120911 -27.388897  152.880795
3            Samford Peri-urban, core1ha      20170901 -27.388897  152.880795


Before visualising our data on an interactive map, we'll extract the site names from our dataset. This will allow us to label and differentiate data points based on their respective site names, making our map more informative.

In [2]:
# Create a new column 'siteName' in the dataframe 'gdf'
# The 'apply' function is used to go through each row in the 'site' column of 'gdf'
# For each row, it extracts the 'siteName' value from the 'site' dictionary and assigns it to the 'siteName' column

gdf['siteName'] = gdf['site'].apply(lambda x: x['siteName'])



### Interactive Map with Site Names
Now, let's create an interactive map using the folium library. The map will display the locations of the TERN sites, and when you click on a site, it will show the site's name.



In [8]:
import folium

import warnings
# It's generally a good practice to address the root cause of warnings in code.
# However, for the purpose of this tutorial and to ensure a clean presentation,
# we are suppressing them. In a real-world scenario, it's advisable to investigate 
# and address warnings appropriately rather than just suppressing them.
warnings.filterwarnings('ignore')


# Initialise the map centered around Australia with a low zoom level
m = folium.Map(location=[-25, 135], zoom_start=4)

# Add points to the map
for idx, row in gdf.iterrows():
    # Use folium.Marker for each site in the GeoDataFrame
    folium.Marker(
        location=[row['geometry'].y, row['geometry'].x], 
        popup=row['siteName'],  # display siteName when the marker is clicked
        icon=folium.Icon(icon="tree"),  # you can customize the icon here
    ).add_to(m)

# Display the map
m
#Please note that interactive maps may not render on github

### Visualising Plant Individuals from Species with Highest Biomass at Samford Site
In this section, we aim to visualise the positions of plant individuals from the species with the highest biomass within the Samford site. By doing this, we can gain insights into the spatial distribution and concentration of these dominant species.

We will first narrow down our dataset to only include records from the Samford site. This will make our subsequent analysis more focused and efficient.

In [20]:
# Convert the GeoJSON data to a Geopandas DataFrame
data_gdf = gpd.GeoDataFrame.from_features(data_geojson['features'])

# Filter the dataframe for only the Samford site based on the 'siteName' within the 'site' column
samford_data = data_gdf[data_gdf['site'].apply(lambda x: x['siteName']).str.contains("Samford Peri-urban")]

# Display the first few rows of the filtered dataset
samford_data.head()



Unnamed: 0,geometry,dataset,site,siteVisit,observations
2,POINT (152.88080 -27.38890),{'dataset.title': 'TERN Ecosystem Processes'},"{'siteName': 'Samford Peri-urban, core1ha', 'p...","{'siteVisitName': '20120911', 'siteVisitDate':...",[{'featureId': 'http://linked.data.gov.au/data...
3,POINT (152.88080 -27.38890),{'dataset.title': 'TERN Ecosystem Processes'},"{'siteName': 'Samford Peri-urban, core1ha', 'p...","{'siteVisitName': '20170901', 'siteVisitDate':...",[{'featureId': 'http://linked.data.gov.au/data...


Now let's identify the species with the highest mean biomass that also have location data.

In [45]:
# Group by species and calculate the mean biomass, then sort in descending order
top_species_location_mean = biomass_location_df.groupby('species')['biomass'].mean().sort_values(ascending=False)

# Select the top 5 species with the highest mean biomass that also have location data
top_5_species_location_mean = top_species_location_mean.head(5).index.tolist()
top_5_species_location_mean


['Eucalyptus siderophloia',
 'Eucalyptus tereticornis subsp. tereticornis',
 'Corymbia intermedia',
 'Angophora subvelutina',
 'Melaleuca quinquenervia']

Now we can filter the 'samford_data' for these top species with location data and visualise their positions.

In [46]:
# Filter the data for top species with location data based on mean biomass
top_species_location_mean_data = samford_data[samford_data['observations'].apply(lambda x: any(obs['feature.scientificName'] in top_5_species_location_mean for obs in x))]

# Initialize an interactive map centered around Samford's average coordinates
m = folium.Map(location=[top_species_location_mean_data.geometry.y.mean(), top_species_location_mean_data.geometry.x.mean()], zoom_start=12)

# Add points to the map
for _, row in top_species_location_mean_data.iterrows():
    for observation in row['observations']:
        if observation['feature.scientificName'] in top_5_species_location_mean:
            folium.Marker(
                location=[observation['featureId.attributes']['featureLatitude']['value'], observation['featureId.attributes']['featureLongitude']['value']],
                popup=f"Species: {observation['feature.scientificName']}<br>Biomass: {observation['feature.observations']['aboveGroundBiomass'][0]['value']}",
                icon=folium.Icon(icon="leaf")
            ).add_to(m)

# Display the map
m


We can also visualise locations of individuals for the single species with the highest mean biomass.

In [47]:
# Select the species with the highest mean biomass
top_species_by_mean = top_5_species_location_mean[0]

# Filter the data for this species
top_species_data = samford_data[samford_data['observations'].apply(lambda x: any(obs['feature.scientificName'] == top_species_by_mean for obs in x))]

# Initialize an interactive map centered around Samford's average coordinates
m = folium.Map(location=[top_species_data.geometry.y.mean(), top_species_data.geometry.x.mean()], zoom_start=12)

# Add points to the map
for _, row in top_species_data.iterrows():
    for observation in row['observations']:
        if observation['feature.scientificName'] == top_species_by_mean:
            folium.Marker(
                location=[observation['featureId.attributes']['featureLatitude']['value'], observation['featureId.attributes']['featureLongitude']['value']],
                popup=f"Species: {observation['feature.scientificName']}<br>Biomass: {observation['feature.observations']['aboveGroundBiomass'][0]['value']}",
                icon=folium.Icon(icon="leaf")
            ).add_to(m)

# Display the map
m


### Conclusion
You have now explored the TERN above-ground biomass dataset in depth, accessed it using TERN's API, and visualised the data in various informative ways. By identifying the species with the highest mean biomass and examining their distribution within the Samford site, you've gained valuable insights into the spatial patterns of biomass.

Using these techniques, you can further explore other spatial datasets, perform more detailed analyses, and even combine multiple datasets to derive deeper ecological insights.