![](https://wherobots.com/wp-content/uploads/2023/12/Inline-Blue_Black_onWhite@3x.png)

# WherobotsAI Raster Inference - Bring Your Own Model

WherobotsAI Raster Inference supports running your own machine learning models on raster images in order to gather insights using the [Machine Learning Model Extension Specification](https://github.com/stac-extensions/mlm) (MLM). MLM is the standard for discovering, sharing, and running machine learning models for geospatial data.

Generally, bringing your own model involves the following steps:

* Saving your model checkpoint using Torchscript (through either scripting or tracing).
* Choosing an S3 bucket to store your model.
* Uploading your Torchscript model to your S3 bucket.
* Filling out two [MLM Specification](https://github.com/stac-extensions/mlm) forms (the Asset form and the MLM form) for your model.
* Uploading the MLM JSON file to your S3 bucket.
* Executing raster inference.
* Analyzing your model inference results.

## Capabilities

WherobotsAI Raster Inference currently supports:

* The following computer vision tasks:
    * Single-label scene classification
    * Object detection
    * Semantic segmentation
    * Segment Anything 2 (text prompt to polygons)
* Workloads with single input tensor and single output tensor
* NVIDIA GPU acceleration
* Pytorch export formats: Torchscript models, ExportedPrograms, and AOTInductor models

### Job Runs

You can complete raster inference with WherobotsAI within a Job Run or as a Wherobots Notebook.

This example discusses how to complete raster inference within a Wherobots Notebook. To complete this as a Job Run, the code samples referenced in subsequent sections would go into a single Python file and then be executed as a Job Run.
For more information on creating Job Runs in Wherobots, see [WherobotsRunOperator](https://docs.wherobots.com/latest/develop/run-operator/).

## Before You Start

Before attempting to use your own machine learning model in WherobotsAI Raster Inference, ensure that you have the following:

* A Professional Edition Wherobots Organization.
    * [Log in to Wherobots Cloud](https://cloud.wherobots.com) to follow along in an interactive Wherobots notebook or complete these steps for your own model in a new notebook.
* A PyTorch model file.
* An [Amazon S3 Bucket](https://docs.wherobots.com/latest/develop/storage-management/s3-storage-integration/) or [a Wherobots Managed Storage](https://docs.wherobots.com/latest/develop/storage-management/storage/#wherobots-managed-storage) for storing your MLM JSON file.

## Save and Upload Your Model

Save your model checkpoint using Torchscript. For more information, see [Saving and Loading Models](https://pytorch.org/tutorials/beginner/saving_loading_models.html) in the PyTorch documentation.

The following Torchscript model checkpoint saving methods are supported:

| Artifact Type | Description |
| ----- | ----- |
| `torch.jit.script` | A model artifact obtained by [`TorchScript`](https://pytorch.org/docs/stable/jit.html). |
| `torch.export.save` | A Pytorch model archive containing an artifact of type [`AOTInductor`](https://docs.pytorch.org/tutorials/recipes/torch_export_aoti_python.html#when-to-use-aotinductor-with-a-python-runtime) or [`ExportedProgram`](https://docs.pytorch.org/docs/2.7/export.htm). |

!!! note
    WherobotsAI Raster Inference currently only supports PyTorch models.

1. Store your model in an S3 bucket. This S3 bucket needs to be accessible to Wherobots Cloud. You can choose to store your model in one of two ways:
    1. Directly in Wherobots Managed Storage. For more information, see [Wherobots storage and notebook guidance](https://docs.wherobots.com/latest/develop/storage-management/storage/#wherobots-notebook-and-data-storage-guidance).
    2. Integrate your existing Amazon S3 storage with Wherobots. For more information on integrating a public or private S3 bucket with Wherobots Cloud see, [S3 storage integration](https://docs.wherobots.com/latest/develop/storage-management/s3-storage-integration/).

In this example, we’ll store our model using [Wherobots Managed Storage](https://docs.wherobots.com/latest/develop/storage-management/storage/#wherobots-managed-storage) and create a `data/customer-XXXX/bring-your-own-model` directory.

!!! note
    This example uploads the model to Wherobots Managed Storage but you can also use your model through integrated storage. For more information, see [S3 storage integration](https://docs.wherobots.com/latest/develop/storage-management/s3-storage-integration/) in the Wherobots documentation.

  ![Upload model](./assets/img/byom-model-pt.png)

The URI to this model is used to create an MLM JSON in the subsequent step.

## Create an MLM JSON for Your Model

### MLM specification overview

The [Machine Learning Model Extension Specification](https://github.com/stac-extensions/mlm) (MLM) is based on the [SpatioTemporal Asset Catalog](https://github.com/radiantearth/stac-spec)’s (STAC) standardized MLM. MLM defines a JSON format that specifies a
model’s properties, input and input processing requirements, and output and output processing
requirements.

MLM creates a standardized way to use your own models for inference. MLM accomplishes this by:

* Enabling the building of searchable custom models and their associated STAC datasets.
* Recording all necessary bands, parameters, modeling artifact locations, and high-level processing steps to deploy an inference service.
* Creating an easy and standardized way to use your own models for inference.

### MLM specification forms

To create an MLM JSON for your model, first fill out the
Model Asset Form in the **Asset Form** tab and then fill out the Model Metadata form in the
**MLM Form** tab.

!!! info
    You must fill out the **Asset Form** before the **MLM Form**.

#### Fill out Asset Form

To fill out the Model Asset Form, do the following:

1. Go to the [Machine Learning Model Metadata Form](https://mlm-form.vercel.app/asset) site.
1. Go to the [Asset Form](https://mlm-form.vercel.app/asset) tab.
   ![Model asset form](assets/img/asset-form.png)
1. Fill in the MLM Model Asset Form with your model information in accordance with the following chart. For compatibility with Raster Inference, you only need to specify the URI to the model artifact. For additional information and metadata fields you may want to document for your model, see [Model Asset](https://github.com/stac-extensions/mlm?tab=readme-ov-file#model-asset) in Machine Learning Model Extension Specification.

    | Field Name | Type | Required or optional | Description |
    | ----- | ----- | ----- | ----- |
    | Title | string | Optional | Name of model asset |
    | URI | string | Required | S3 URI to your saved Torchscript model. |
#### Fill out MLM form

To create the MLM JSON your model, do the following:

1. Within the [Machine Learning Model Metadata Form](https://mlm-form.vercel.app/) site, go to the **MLM Form** tab.
    ![MLM form](assets/img/mlm-form.png)
    * This form validates your input formats so that they conform to the [MLM specification](https://github.com/stac-extensions/mlm). For clarity, we’ve specified a few fields for reference below. For a full breakdown of the inputs and definitions, see [Item Properties and Collection Fields](https://github.com/stac-extensions/mlm) in the Machine Learning Model Extension Specification.

    | MLM metadata form field | Expected Input | Example Input |
    | ----- | ----- | ----- |
    | Is it pretrained? | true or false | true |
    | Categories | List of classes for your model | “Solar panels”, “Wind farms”, “Forests” |

1. Click **Download JSON** to save the JSON file.

[Here is a reference MLM](https://huggingface.co/wherobots/mlm-stac/blob/2fd1a21026b0f80d4f7721605a3fc1f4ca389dfa/classification/landcover-eurosat-sentinel2/model-metadata.json#L134-L185) for the `landcover-eurostat-sentinel2` Wherobots hosted model.

## Upload your model’s MLM JSON

We created an MLM JSON for the Torchscript model by following the steps in [Create an MLM JSON for Your Model](#create-an-mlm-json-for-your-model).

1. Upload the JSON to the same S3 bucket as the Torchscript model in Wherobots.

    ![Upload json](assets/img/byom-storage.png)

The path to the MLM JSON will be the `user_mlm_uri` in the rest of the example.

## Run Inference Using Your Model on Raster Data

Currently, WherobotsAI Raster Inference supports running model inference on the following tasks:

* Single-label scene classification
* Object Detection
* Semantic Segmentation
* Text to Bounding Boxes
* Text to Instance Segments

The following chart details the WherobotsAI Raster Inference function calls to use for each Computer Vision task.

| Computer Vision Task | SQL API | Python API | Walk Through Example |
| ----- | ----- | ----- | ----- |
| Image Classification | `RS_Classify()` | `rs_classify()` | [Run inference for classification](https://docs.wherobots.com/latest/tutorials/wherobotsai/wherobots-inference/classification/?h=rs_classify) |
| Object Detection | `RS_Detect_BBoxes()` | `rs_detect()` | [Run inference for Object Detection](https://docs.wherobots.com/latest/tutorials/wherobotsai/wherobots-inference/object_detection/) |
| Semantic Segmentation | `RS_Segment()` , `RS_Segment_to_Geoms()` | `rs_segment()` | [Run inference for Semantic Segmentation](https://docs.wherobots.com/latest/tutorials/wherobotsai/wherobots-inference/segmentation/) |
| Instance Segmentation | `RS_Text_To_Segments()` , | `rs_text_to_segments()` | [Run inference for Segment Anything 2](https://docs.wherobots.com/latest/tutorials/wherobotsai/wherobots-inference/raster-text-to-segments-airplanes/) |

## Semantic Segmentation example

In the following example, we’ll discuss how to use your own model for Raster Inference in
Wherobots by performing Semantic Segmentation (also referred to as pixel classification) to identify solar farms in Arizona.

This example uses:

* A [Pytorch Archive](https://docs.pytorch.org/tutorials/recipes/torch_export_aoti_python.html#when-to-use-aotinductor-with-a-python-runtime) model fine-tuned from the [Satlas model](https://satlas.allen.ai/ai) <sup>1</sup> on Sentinel-2 multispectral satellite imagery to identify solar farms
* An MLM JSON derived from the [Satlas model documentation](https://satlas.allen.ai/ai) for this task
* A set of new Sentinel-2 multispectral satellite images sampled from the [Satlas dataset](https://satlas.allen.ai/ai)

!!! note
    This example is also available to walk through in `examples/Analyzing-Data/Bring_Your_Own_Model_Raster_Inference.ipynb` once you launch a Wherobots Notebook instance.

To use your model for Semantic Segmentation, follow the steps in the subsequent sections to configure the MLM path, load the Torchscript model, and run Raster Inference on the new dataset.

## Start a notebook

To start a notebook to run raster inference with WherobotsAI, do the following:

1. Log in to Wherobots Cloud.
2. Start a Wherobots instance. We recommend using the **Tiny-GPU** runtime. It can take several minutes for a runtime to load.
3. Open a Python notebook.
    1. To interact with this example yourself, open `examples/Analyzing-Data/Bring_Your_Own_Model_Raster_Inference.ipynb`.
    1. If you are incorporating you own model, create a new notebook. If using your own model, use the code samples in this tutorial as a guide.
    !!! note
        If you add the S3 storage integration after starting the notebook, you must restart the notebook in order to access to the newly added storage integration.

For more information on starting a notebook, see [Notebook instance management](https://docs.wherobots.com/latest/develop/notebook-management/notebook-instance-management/)
and [Jupyter Notebook Management](https://docs.wherobots.com/latest/develop/notebook-management/jupyter-notebook-management/).

### Set Up The Sedona Context

The following code creates the `SedonaContext`:

In [None]:
import warnings
warnings.filterwarnings('ignore')
import os

from sedona.spark import *
from pyspark.sql.functions import expr

config = (
    SedonaContext.builder().appName('segmentation-batch-inference')
    .getOrCreate()
)

sedona = SedonaContext.create(config)

### Create the URI variable

Next, we need to set the `user_mlm_uri` path to the S3 URI of the MLM JSON that we created in [Upload your model's MLM JSON](#upload-your-models-mlm-json).

WherobotsAI Raster Inference uses `user_mlm_uri` to get the necessary processing information
for the model and know which model to use to run inference.

To get the S3 URI of the MLM JSON:

1. Navigate to the MLM JSON in Wherobots Cloud.
1. Copy/paste the location of the file and set it to `user_mlm_uri`.

![Copy MLM URI](assets/img/byom-storage-copy.png)

In [None]:
user_mlm_uri = [PATH-TO-YOUR-MLM-JSON]

### Load Satellite Imagery

Load the satellite imagery that we will be running inference over. These GeoTiff images are
loaded as [out-db rasters in WherobotsDB](https://docs.wherobots.com/latest/tutorials/wherobotsdb/raster-data/raster-load/), where each row represents a different scene.

In [None]:
tif_folder_path = 's3a://wherobots-benchmark-prod/data/ml/satlas/'
df_raster_input = sedona.read.format("raster").load(f"{tif_folder_path}/*.tif").sample(.05)
df_raster_input.show(truncate=False)

### Run Predictions And Visualize Results

#### Raster Inference SQL function RS_Segment

To run predictions, specify the MLM model metadata file we saved to `user_mlm_uri`.

Predictions can be run with this Raster Inference SQL function, [`RS_Segment`](https://docs.wherobots.com/latest/api/wherobots-inference/pythondoc/inference/sql_functions/#rs_segment) or the [Python API](#using-the-wherobotsinference-python-api).

Here we generate 400 raster predictions using `RS_Segment`.

In [None]:
predictions_df = sedona.sql(f"""
SELECT
  rast,
  segment_result.*
FROM (
  SELECT
    rast,
    RS_SEGMENT('{user_mlm_uri}', rast) AS segment_result
  FROM
    df_raster_input
) AS segment_fields
""")

predictions_df.cache().count()
predictions_df.show()
predictions_df.createOrReplaceTempView("predictions")

#### Using the wherobots.inference Python API

For those who prefer working with Python, `wherobots.inference` provides a module to register
SQL inference functions as Python functions.

To use this module, replace the code in [Raster Inference SQL function RS_Segment](#raster-inference-sql-function-rs_segment) with the following code sample:

In [None]:
from wherobots.inference.engine.register import create_semantic_segmentation_udfs
from pyspark.sql.functions import col
rs_segment =  create_semantic_segmentation_udfs(batch_size = 9, sedona=sedona)
df = df_raster_input.withColumn("segment_result", rs_segment(user_mlm_uri, col("rast"))).select(
                               "rast",
                               col("segment_result.confidence_array").alias("confidence_array"),
                               col("segment_result.class_map").alias("class_map")
                           )
df.show(3)

### Extract insights

#### Initial results

Now that we've generated predictions using our model over our satellite imagery, we can use
the `RS_Segment_To_Geoms` function to extract geometries from the classified imagery pixels.

These geometries delineate the boundaries of possible solar farms and contain the average
model confidence scores of the pixels contained within them.

In [None]:
df_multipolys = sedona.sql("""
    WITH t AS (
        SELECT RS_SEGMENT_TO_GEOMS(rast, confidence_array, array(1), class_map, 0.65) result
        FROM predictions
    )
    SELECT result.* FROM t
""")

df_multipolys.cache().count()
df_multipolys.show()
df_multipolys.createOrReplaceTempView("multipolygon_predictions")

We'll specify the following:

* `rast`: A raster column to use for georeferencing our results
* `confidence_array`: The prediction result from the previous step
* `array(1)`: Our category label "1" returned by the model representing Solar Farms
* `class_map`: Class map to use for assigning labels to the prediction
* `0.65`: A confidence threshold between 0 and 1 to use to threshold classified pixels from the model.

#### Filtered results

Since we ran inference across the *entire* state of Arizona, many scenes don't contain solar farms and as a result, don't have positive detections.

Let's filter out scenes without segmentation detections so that we only retain the positive results.

In [None]:
df_merged_predictions = sedona.sql("""
    SELECT
        element_at(class_name, 1) AS class_name,
        cast(element_at(average_pixel_confidence_score, 1) AS double) AS average_pixel_confidence_score,
        ST_Collect(geometry) AS merged_geom
    FROM
        multipolygon_predictions
""")

This leaves us with a few predicted solar farm polygons for our 400 satellite image samples.

In [None]:
df_filtered_predictions = df_merged_predictions.filter("ST_IsEmpty(merged_geom) = False")
df_filtered_predictions.cache().count()
df_filtered_predictions.show()

#### Visualize with SedonaKepler

We'll plot these filtered results with SedonaKepler. Compare the satellite basemap with the predictions and see if there's a match\!

!!! note
    This basemap is compiled from images taken at different times. This means the features shown on the basemap might not match the imagery we just used for our analysis.

In [None]:
from sedona.spark import *
config = {
    'version': 'v1',
    'config': {
        'mapStyle': {
            'styleType': 'dark',
            'topLayerGroups': {},
            'visibleLayerGroups': {},
            'mapStyles': {}
        },
    }
}
map = SedonaKepler.create_map(config=config)

SedonaKepler.add_df(map, df=df_filtered_predictions, name="Solar Farm Detections")
map

1.  Bastani, Favyen, Wolters, Piper, Gupta, Ritwik, Ferdinando, Joe, and Kembhavi, Aniruddha. "SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding." *arXiv preprint arXiv:2211.15660* (2023). [https://doi.org/10.48550/arXiv.2211.15660](https://doi.org/10.48550/arXiv.2211.15660)
