![](https://wherobots.com/wp-content/uploads/2023/12/Inline-Blue_Black_onWhite@3x.png)

## Wherobots Inference - Segmentation 

This example demonstrates query inference using a segmentation model with Wherobots Inference to identify solar farms in satellite imagery. We will use a machine-learning model from [Satlas](https://satlas.allen.ai/ai) <sup>1</sup> which was trained using imagery from the European Space Agency’s Sentinel-2 satellites.

**Note: This notebook requires the Wherobots Inference functionality to be enabled and a GPU runtime selected in Wherobots Cloud. Please [contact us](https://wherobots.com/contact/) to enable these features.**


### Step 1: Set Up The WherobotsDB Context

Here we configure WherobotsDB to enable access to the necessary cloud object storage buckets with sample data and to enable the WherobotsAI features in WherobotsDB. 

In [None]:
import warnings
warnings.filterwarnings('ignore')

from wherobots.inference.data.io import read_raster_table
from sedona.spark import SedonaContext
from pyspark.sql.functions import expr

config = SedonaContext.builder().appName('segmentation-batch-inference')\
    .getOrCreate()

sedona = SedonaContext.create(config)

### 2: Load Satellite Imagery

Next, we load the satellite imagery that we will be running inference over. These GeoTiff images are loaded as *out-db* rasters in WherobotsDB, where each row represents a different scene.

In [None]:
tif_folder_path = 's3a://wherobots-benchmark-prod/data/ml/satlas/'
files_df = read_raster_table(tif_folder_path, sedona, limit=1000)
df_raster_input = files_df.withColumn(
        "outdb_raster", expr("RS_FromPath(path)")
    )

df_raster_input.cache().show(truncate=False)

df_raster_input.createOrReplaceTempView("df_raster_input")

### 3: Run Predictions And Visualize Results

To run predictions we will specify the model we wish to use. Some models are pre-loaded and made available in Wherobots Cloud. We can also load our own models. Predictions can be run using Wherobot's Spatial SQL functions, in this case `RS_Segment`.

Here we generate 100 predictions using `RS_Segment`.

In [None]:
model_id = 'solar-satlas-sentinel2'

sedona.sql(f"""
 CREATE TEMP VIEW segment_fields AS (
    SELECT
    outdb_raster,
    RS_SEGMENT('{model_id}', outdb_raster) AS segment_result
  FROM
    df_raster_input)
""")

predictions_df = sedona.sql(f"""
SELECT
  outdb_raster,
  segment_result.*
FROM segment_fields
""")

predictions_df.cache().show()

predictions_df.createOrReplaceTempView("predictions_df")

Now that we've generated predictions using our model over our satellite imagery, we can use the `RS_Segment_To_Geoms` function to extract the geometries indicating the model has identified as possible solar farms. we'll specify the following:

* a raster column to use for georeferencing our results
* the prediction result from the previous step
* our category label "1" returned by the model representing Solar Farms and the class map to use for assigning labels to the prediction
* a confidence threshold between 0 and 1.

In [None]:
df_multipolys = sedona.sql("""
    WITH t AS (
        SELECT RS_SEGMENT_TO_GEOMS(outdb_raster, confidence_array, array(1), class_map, 0.65) result
        FROM predictions_df
    )
    SELECT result.* FROM t
""")

df_multipolys.cache().show()

Now we can save the multipolygon results out to Parquet.

In [None]:
import os
USER_S3_URL = os.environ.get("USER_S3_PATH")
output_path = USER_S3_URL + "semantic_segmentation_multipoly_small100_results.parquet"

In [None]:
df_multipolys.write.parquet(output_path, mode="overwrite")

df_multipolys = sedona.read.format("parquet").load(output_path)

In [None]:
df_multipolys.show(10)

Since we ran inference across the state of Arizona, many scenes don't contain solar farms and don't have positive detections. Let's filter out scenes without detections so that we can plot the results.

In [None]:
df_multipolys.createOrReplaceTempView("df_multipolys")

df_collected = sedona.sql("""
    SELECT
        class_name, average_pixel_confidence_score, ST_Collect(geometry) AS collected_geom
    FROM
        df_multipolys
""")

In [None]:
df_multipolys = df_collected.filter("ST_IsEmpty(collected_geom) = False")

This leaves us with a few predicted solar farm polygons for our 1000 satellite image samples.

In [None]:
df_multipolys.count()

We'll plot these with SedonaKepler. Compare the satellite basemap with the predictions and see if there's a match!

In [None]:
df_multipolys.show()

In [None]:
from sedona.maps.SedonaKepler import SedonaKepler
config = {
    'version': 'v1',
    'config': {
        'mapStyle': {
            'styleType': 'light',
            'topLayerGroups': {},
            'visibleLayerGroups': {},
            'mapStyles': {}
        },
    }
}
map = SedonaKepler.create_map(config=config)

SedonaKepler.add_df(map, df=df_multipolys, name="Solar Farm Detections")
map

### wherobots.inference Python API

If you prefer python, wherobots.inference offers a module for registering the SQL inference functions as python functions. Below we run the same inference as before with RS_SEGMENT.

In [None]:
from wherobots.inference.engine.register import create_semantic_segmentation_udfs
from pyspark.sql.functions import col
rs_segment =  create_semantic_segmentation_udfs(batch_size = 10, sedona=sedona)
df = df_raster_input.withColumn("segment_result", rs_segment(model_id, col("outdb_raster"))).select(
                               "outdb_raster",
                               col("segment_result.confidence_array").alias("confidence_array"),
                               col("segment_result.class_map").alias("class_map")
                           )
df.show(3)

### References

1. Bastani, Favyen, Wolters, Piper, Gupta, Ritwik, Ferdinando, Joe, and Kembhavi, Aniruddha. "SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding." *arXiv preprint arXiv:2211.15660* (2023). [https://doi.org/10.48550/arXiv.2211.15660](https://doi.org/10.48550/arXiv.2211.15660)