# Water detection with Sentinel-1

**Adapted from https://github.com/digitalearthafrica/deafrica-sandbox-notebooks/blob/main/Real_world_examples/Radar_water_detection.ipynb**

## Background

Over 40% of the world’s population lives within 100 km of the coastline.
However, coastal environments are constantly changing, with erosion and coastal change presenting a major challenge to valuable coastal infrastructure and important ecological habitats.
Updating data on the position of the coastline is essential for coastal managers to be able to identify and minimise the impacts of coastal change and erosion. 
The coastal regions are also home to many wetlands. Monitoring of water extent helps to understand and protect these dynamic and productive ecosystems.

While coastlines and water extents can be mapped using optical data (demonstrated in the [Coastal Erosion notebook](../Real_world_examples/Coastal_erosion.ipynb)), these images can be strongly affected by the weather, especially through the presence of clouds, which obscure the land and water below.
This can be a particular problem in cloudy regions or areas where clouds in wet season prevent optical satellites from taking clear images for many months of the year.

Radar observations are largely unaffected by cloud cover.
The two Sentinel-1 satellites, operated by ESA as part of the Copernicus program, provide all-weather observations every 6 to 12 days over Africa.
By developing a process to classify the observed pixels as either water or land, it is possible to identify the shoreline and map the dynamic water extents using radar data.
For more information, see the [Sentinel-1](../Datasets/Sentinel_1.ipynb) notebook.

## Description

In this example, we use data from the Sentinel-1 satellites to build a classifier that can determine whether a pixel is a water or land.
Specifically, this notebook uses analysis-ready radar backscatter, which describes the strength of the signal received by the satellite.

The notebook contains the following steps:

1. Load Sentinel-1 backscatter data for an area of interest and visualize the returned data
2. Applying speckle filter and converting the digital numbers to dB values for analysis
3. Use histogram analysis to determine the threshold for water classification
4. Design a classifier to distinguish land and water
5. Apply the classifier to the area of interest and interpret the results

***

## Getting started

To run this analysis, run all the cells in the notebook, starting with the "Load packages" cell. 

### Load packages
Import Python packages that are used for the analysis.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from ipyleaflet import basemaps
from odc.geo.geom import point
from odc.stac import load
from planetary_computer import sign_url
from pystac_client import Client
from scipy.ndimage import uniform_filter, variance
from skimage.filters import threshold_minimum

## Find and load data

In this example, we're using Sentinel-1 radiometrically terrain corrected
data from the Microsoft Planetary Computer. This is freely available, but
you need to use a special Python function `sign_url` to authorise access.

In [None]:
# Microsoft Planetary Computer STAC Catalog URL
catalog = "https://planetarycomputer.microsoft.com/api/stac/v1"

# Create a STAC Client
client = Client.open(catalog)

In [None]:
# Set study area name for outputs
study_area = "lake_toba"

# Create an aoi
lat, lon = 2.7965, 98.6331
aoi = point(lon, lat, crs="epsg:4326")
# Create a bounding box around the point
region = aoi.buffer(0.2).to_crs("epsg:4326")

region.explore(tiles=basemaps.Esri.WorldImagery)

In [None]:
datetime = "2024-06/2024-07"

items = client.search(
    collections=["sentinel-1-rtc"],
    intersects=region,
    datetime=datetime,
).item_collection()

print(f"Found {len(items)} items")

In [None]:
data = load(
    items,
    geopolygon=region,
    measurements=["vv", "vh"],
    groupby="solar_day",
    patch_url=sign_url,
    chunks={"x": 2048, "y": 2048}
).compute()

data

### Plot data

Do some simple plots of the data, so we know what we're working with

In [None]:
# Plot VV polarisation
data.isel(time=range(0,6)).vv.plot(cmap="Greys_r", robust=True, col="time", col_wrap=3)

In [None]:
# Plot VH polarisation
data.isel(time=range(0,6)).vh.plot(cmap="Greys_r", robust=True, col="time", col_wrap=3)

Backscatter measurements can be combined in visualization to highlight the different polarization signatures. 
For the RGB visualization below, the ratio between VH and VV is added as a third measurement band.

In [None]:
# VH/VV is a potentially useful third feature after VV and VH 
data['vh/vv'] = data.vh/data.vv

# Median values are used to scale the measurements so they have a similar range for visualization
medians = data.median(dim=["time"])

# Get scaled values so we can plot an RGB image for selected timesteps
scaled = data / medians

In [None]:
# Do the plotting
scaled.isel(time=range(0,6)).to_array().plot.imshow(robust=True, col="time", col_wrap=3)

## Apply speckle filtering

Radar observations appear speckly due to random interference of coherent signals from target scatters. 
The speckle noise can be reduced by averaging pixel values over an area or over time. 
However, averaging over a fixed window smoothes out real local spatial variation and leads to reduced spatial resolution.
An adaptive approach that takes into account local homogeneity is therefore preferred.

Below, we apply the Lee filter, one of the popular adaptive speckle filters.


In [None]:
# Define a function to apply lee filtering on S1 image 
def lee_filter(da, size):
    """
    Apply lee filter of specified window size.
    Adapted from https://stackoverflow.com/questions/39785970/speckle-lee-filter-in-python

    """
    da_notime = da.squeeze()
    img = da_notime.values
    img_mean = uniform_filter(img, size)
    img_sqr_mean = uniform_filter(img**2, size)
    img_variance = img_sqr_mean - img_mean**2

    overall_variance = variance(img)

    img_weights = img_variance / (img_variance + overall_variance)
    img_output = img_mean + img_weights * (img - img_mean)

    # Convert numpy array back to xarray, flipping the Y axis
    output = xr.DataArray(img_output, dims=da_notime.dims, coords=da_notime.coords)
    
    return output

# The lee filter above doesn't handle null values
# We therefore set null values to 0 before applying the filter
valid = np.isfinite(data)
masked = data.where(valid, 0)

# Create a new entry in dataset corresponding to filtered VV and VH data
data["filtered_vv"] = masked.vv.groupby("time").map(lee_filter, size=7)
data["filtered_vh"] = masked.vh.groupby("time").map(lee_filter, size=7)

# Null pixels should remain null
data['filtered_vv'] = data.filtered_vv.where(valid.vv)
data['filtered_vh'] = data.filtered_vh.where(valid.vh)

In [None]:
# Images appear smoother after speckle filtering
fig, ax = plt.subplots(1, 2, figsize=(15,5))
data["vv"].isel(time=3).plot(ax = ax[0],robust=True)
data["filtered_vv"].isel(time=3).plot(ax = ax[1],robust=True);
ax[0].set_title('vv')
ax[1].set_title('filtered vv')
plt.tight_layout();

### Convert the digital numbers to dB

While Sentinel-1 backscatter is provided as linear intensity, it is often useful to convert the backscatter to decible (dB) for analysis. 
Backscatter in dB unit has a more symmetric noise profile and less skewed value distribution for easier statistical evaluation.

In [None]:
data['filtered_vv'] = 10 * np.log10(data.filtered_vv)
data['filtered_vh'] = 10 * np.log10(data.filtered_vh)

## Histogram analysis for Sentinel-1

Backscatter distributions are plotted below as histograms.

In [None]:
fig = plt.figure(figsize=(12, 3))
data.filtered_vh.plot.hist(bins=1000, label="VH filtered")
data.filtered_vv.plot.hist(bins=1000, label="VV filtered",alpha=0.5)
plt.xlim(-40,-1)
plt.legend()
plt.xlabel("DN values in(dB)")
plt.title("Comparison of Lee filtered VH and VV polarisation bands");

## Build and apply the classifier 

The histogram for VH backscatter shows a bimodal distribution with low values over water and high values over land.
The VV histogram has multiple peaks and less obvious seperation between water and land.

We therefore build a classifier based on VH backscatter. We choose a threshold to separate land and water: pixels with values below the threshold are water, and pixels with values above the threshold are not water (land).

There are several ways to determine the threshold. 
Here, we use the `threshod_minimum` function implemented in the `skimage` package to determine the threshold from the *VH* histogram automatically.
This method computes the histogram for all backscatter values, smooths it until there are only two maxima and find the minimum in between as the threshold.

In [None]:
vv_no_nans = data.filtered_vv.values[~np.isnan(data.filtered_vv.values)]
threshold_vv = threshold_minimum(vv_no_nans)

print(threshold_vv)

### Visualise threshold

To check if our chosen threshold reasonably divides the two distributions, we can add the threshold to the histogram plots we made earlier. 

In [None]:
fig, ax = plt.subplots(figsize=(15, 3))
data.filtered_vv.plot.hist(bins=1000, label="VH filtered", color="gray")
plt.xlim(-40,-5)
ax.axvspan(xmin=-40.0, xmax=threshold_vv, alpha=0.25, color="blue", label="Water")
ax.axvspan(xmin=threshold_vv,
           xmax=-5,
           alpha=0.25,
           color="green",
           label="Not Water")
plt.legend()
plt.xlabel("VH (dB)")
plt.title("Effect of the classifier")
plt.show()

### Define the classifier

This threshold is used to write a function to only return the pixels that are classified as water. The basic steps that the function will perform are:

1. Find all pixels that have filtered values lower than the threshold; these are the `water` pixels.
2. Return a data set containing the `water` pixels.


In [None]:
def S1_water_classifier(da, threshold=threshold_vv):
    water_data_array = da < threshold
    return water_data_array.to_dataset(name="s1_water")

Now that we have defined the classifier function, we can apply it to the data. After running the classifier, we will able to view the classified data product by running `print(S1.water)`.

In [None]:
data['water'] = S1_water_classifier(data.filtered_vv).s1_water

### Assessment with mean

We can now view the image with our classification.
The classifier returns either `True` or `False` for each pixel.
To detect the boundaries of water features, we want to check which pixels are always water and which are always land.
Conveniently, Python encodes `True = 1` and `False = 0`.

If we plot the average classified pixel value, pixels that are always water will have an average value of `1` and pixels that are always land will have an average of `0`.
Pixels that are sometimes water and sometimes land will have an average between these values. In this case study, these pixels are associated with seasonally inundated wetland areas. 

The following cell plots the average classified pixel value, or the frequency of water detection, over time.

In [None]:
# Plot the mean of each classified pixel value
water_summary = (data.water.mean(dim='time') * 100).to_dataset(name="water_percentage")
water_summary["water_stdev"] = data.water.std(dim='time') * 100

water_summary.water_percentage.odc.explore(
    cmap="RdBu",
    vmin=0, 
    max=100,
    name="Water percentage",
    tiles=basemaps.Esri.WorldImagery,
    attr="ESRI WorldImagery",
)

You can see that the selected threshold has done a good job of separating the water pixels (in blue) and land pixels (in red) as well as ephemeral water features in between. 

You should be able to see that the shoreline takes on a mix of values between `0` and `1`, highlighting pixels that are sometimes land and sometimes water.
This is likely due to the effect of rising and falling tides, with some radar observations being captured at low tide, and others at high tide.



### Assessment with standard deviation

Given that we've identified the shoreline as the pixels that are classified sometimes as land and sometimes as water, we can also see if the standard deviation of each pixel over time is a reasonable way to determine if a pixel is a shoreline or not.

Similar to how we calculated and plotted the mean above, you can calculate and plot the standard deviation by using the `std` function in place of the `mean` function.

If you'd like to see the results using a different colour scheme, you can also try substituting `cmap="Greys"` or `cmap="Blues"` in place of `cmap="viridis"`.


In [None]:
water_summary.water_stdev.odc.explore(
    tiles=basemaps.Esri.WorldImagery,
)

The standard deviation we calculated above gives us an idea of how variable a pixel has been over the entire period of time that we looked at. From the image above, you should be able to see that the land and water pixels almost always have a standard deviation of `0`, meaning they didn't change over the time we sampled. The shoreline and wetlands however have a higher standard deviation, indicating that they change frequently between water and non-water.

An important thing to recognise is that the standard deviation might not be able to detect the difference between noise, tides, and ongoing change, since a pixel that frequently alternates between land and water (noise) could have the same standard deviation as a pixel that is land for some time, then becomes water for the remaining time (ongoing change or tides).

## Next steps

When you are done, return to the "Analysis parameters" section, modify some values (e.g. lat and lon) and rerun the analysis. You can use the interactive map in the "View the selected location" section to find new central latitude and longitude values by panning and zooming, and then clicking on the area you wish to extract location values for. You can also use Google maps to search for a location you know, then return the latitude and longitude values by clicking the map.

---

## Additional information

**License:** The code in this notebook is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). 
Digital Earth Africa data is licensed under the [Creative Commons by Attribution 4.0](https://creativecommons.org/licenses/by/4.0/) license.

**Contact:** If you need assistance, please post a question on the [Open Data Cube Slack channel](http://slack.opendatacube.org/) or on the [GIS Stack Exchange](https://gis.stackexchange.com/questions/ask?tags=open-data-cube) using the `open-data-cube` tag (you can view previously asked questions [here](https://gis.stackexchange.com/questions/tagged/open-data-cube)).
If you would like to report an issue with this notebook, you can file one on [Github](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks).

**Compatible datacube version:**