![Geospatial Studio banner](../docs/images/banner.png)

# üåç Geospatial Exploration and Orchestration Studio - Getting Started Notebook

<table>
<tr>
  <td><strong>License</strong></td>
  <td>
    <img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" />
  </td>
</tr>
<tr>
  <td><strong>TerraStackAI</strong></td>
  <td>
    <img src="https://img.shields.io/badge/TerraTorch-a3b18a" />
    <img src="https://img.shields.io/badge/TerraKit-588157" />
    <img src="https://img.shields.io/badge/Iterate-3a5a40" />
  </td>
</tr>
<tr>
  <td><strong>Built With</strong></td>
  <td>
    <img src="https://img.shields.io/badge/Python-3.11-blue.svg?logo=python&logoColor=white" />
    <img src="https://img.shields.io/badge/code%20style-black-000000.svg" />
    <img src=https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white />

  </td>
</tr>
<tr>
  <td><strong>Deployment</strong></td>
  <td>
    <img src="https://img.shields.io/badge/Helm-0F1689?style=flat&logo=helm" />
    <img src="https://img.shields.io/badge/-Red_Hat_OpenShift-EE0000?logo=redhatopenshift&logoColor=white" />
    <img src="https://img.shields.io/badge/kubernetes-326CE5?&logo=kubernetes&logoColor=white" />
    <img src="https://img.shields.io/badge/Auth-OAuth_2.0-purple" />
    <img src="https://img.shields.io/badge/PostgreSQL-316192?logo=postgresql&logoColor=white" />
    <img src="https://img.shields.io/badge/Keycloak-111921?logo=keycloak&logoColor=white" />
    <img src="https://img.shields.io/badge/-MinIO-C72E49?logo=minio&logoColor=white" />
  </td>
</tr>
</table>

[![Studio Documentation](https://img.shields.io/badge/Studio_Documentation-526CFE?style=for-the-badge&logo=MaterialForMkDocs&logoColor=white)](https://terrastackai.github.io/geospatial-studio)

---

### 1.0 Introduction

Now you have a clean deployment of the studio and it is time to start using it. The steps below will enable you to onboard some initial artefacts, before trying out the functionality.


### 1.1 Set up and  Installation

#### 1.1.0 Prerequisites

Create a python environment 
```bash
python -m venv venv
source venv/bin/activate
```
Install geostudio sdk
```bash
pip install geostudio
```

Set the jupyter notebook kernel to point to the above created python environment.

#### 1.1.1 Import required packages

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
# Import the required packages
import json
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
from geostudio import Client

### 1.2 Connecting to the platform
First, we set up the connection to the platform backend.  To do this we need the base url for the studio UI and an API key.

To get an API Key:
1. Go to your deployed version of the Geospatial Studio UI (e.g. https://localhost:4180/) page and navigate to the Manage your API keys link.
2. This should pop-up a window where you can generate, access and delete your api keys. NB: every user is limited to a maximum of two activate api keys at any one time.

Store the API key and geostudio ui base url in a credentials file locally, for example in /User/bob/.geostudio_config_file. You can do this by:

```bash
echo "GEOSTUDIO_API_KEY=<paste_api_key_here>" > .geostudio_config_file
echo "BASE_STUDIO_UI_URL=<paste_ui_base_url_here>" >> .geostudio_config_file
```

Copy and paste the file path to this credentials file in call below. You will need to ensure that the `geostudio_config_file` is accessible and correctly configured. If you encounter any issues, please verify the file path and contents of the `.geostudio_config_file` to ensure they are accurate and up-to-date.

In [None]:
#############################################################
# Initialize Geostudio client using a geostudio config file
#############################################################
gfm_client = Client(geostudio_config_file=".geostudio_config_file")

> In the follow-up steps where you utilize the `gfm_client`, in case you don't get successful responses (200 - 299) you can also check the logs for the deployed pods to understand what went wrong, these include `geofm-gateway`, `geofm-gateway-celery-worker`, `postgresql-0`, `pipelines-xxx`, etc.

> In addition if you need to restart any of the port-forwards you can use the following commands:

```bash
kubectl port-forward -n $OC_PROJECT svc/keycloak 8080:8080 >> studio-pf.log 2>&1 &
kubectl port-forward -n $OC_PROJECT svc/postgresql 54320:5432 >> studio-pf.log 2>&1 &
kubectl port-forward -n $OC_PROJECT svc/geofm-geoserver 3000:3000 >> studio-pf.log 2>&1 &
kubectl port-forward -n $OC_PROJECT deployment/geofm-ui 4180:4180 >> studio-pf.log 2>&1 &
kubectl port-forward -n $OC_PROJECT deployment/geofm-gateway 4181:4180 >> studio-pf.log 2>&1 &
kubectl port-forward -n $OC_PROJECT deployment/geofm-mlflow 5000:5000 >> studio-pf.log 2>&1 &
kubectl port-forward -n $OC_PROJECT svc/minio 9001:9001 >> studio-pf.log 2>&1 &
kubectl port-forward -n $OC_PROJECT svc/minio 9000:9000 >> studio-pf.log 2>&1 &
```

### 1.3 Onboard an existing inference output (useful for loading inference precomputed examples)

Onboard one of the inferences output. This will start a pipeline to pull the data and set it up in the platform. You should now be able to browser to the inferences page in the UI and view the example/s you have added.

In [None]:
# onboard inference example for AGB Data for Karen, Nairobi,kenya
with open('payloads/inferences/inference-agb-karen.json', 'r') as f:
    onboard_example = json.load(f)
example_response=gfm_client.submit_inference(data=onboard_example)
display(example_response)

### 1.4 Onboard an existing tuned models and run inference

We will onboard a tuned model from a URL. This is initiated by an API call, which will trigger the onboarding process, starting with a download in the backend. Once the download is completed, it should appear with completed status in the UI models/tunes page. 

First we ensure we have a tuning task templates. These are the outline configurations to make basic tuning tasks easier for users. The tuning task tells the model what type of task it is (segmentation, regression etc), and exposes a range of optional hyperparameters which the user can set. These all have reasonable defaults, but it gives users the possibility to configure the model training how they wish. Below, we will onboard the segmentation task template to the studio, which will make it available to users in the UI.

In [None]:
# segmentation template
with open('payloads/templates/template-seg.json', 'r') as f:
    template_seg = json.load(f)
template_response=gfm_client.create_task(template_seg)
display(template_response)

Then we want to onboard an existing tuned model. This will involve downloading to the cluster the tune checkpoints/weights and terratorch config yaml from a presigned url and also register metadata for this tune in to the database.

In [None]:
# Load a prithvi-eo-flood complete tune to the studio
with open('payloads/tunes/tune-prithvi-eo-flood.json','r') as f:
    complete_tune = json.load(f)
tune_response=gfm_client.upload_completed_tunes(complete_tune)
display(json.dumps(tune_response,indent=2))

Below we are polling to check if the onboarding of tune is completed. This essentially takes a few minutes (1 or 2 minutes depending on network connection). In case of any errors or the process taking long kindly check the logs for the deployed pods for `geofm-gateway`, and/or `geofm-gateway-celery-worker`. In some cases you might need to restart the pods. In addition if you need to restart any of the port-forwards you can use the port fowarding commands defined in section 1.2 above.

In [None]:
gfm_client.poll_finetuning_until_finished(tune_response['tune_id'])

Finally, we submit an inference run to try-out the tune above. The payload below defines which spatial and temporal domain to run for the inference. 

In [None]:

tune_id = tune_response['tune_id']

# Define the inference payload
payload = {
    "model_display_name": "geofm-sandbox-models",
    "fine_tuning_id":tune_id,
    "location": "Dakhin Petbaha, Raha, Nagaon, Assam, India",
    "description": "Flood Assam local with sentinel aws",
    "spatial_domain": {
        "bbox": [
            [92.703396, 26.247896, 92.748087, 26.267903]
        ],
        "urls": [],
        "tiles": [],
        "polygons": []
    },
    "temporal_domain": [
        "2024-07-25_2024-07-28"
    ]
}

# Submit the inference request
gfm_client.try_out_tune(tune_id=tune_id, data=payload)

Navigate to the geospatial studio UI and click on the `Start fine-tuning` card. Under Model & Tunes identify and click on `geofm-sandbox-models` and under `History` click on the entry with name `Flood Assam local with sentinel aws` i.e. the inference you just run. This loads the inference page with status of your inference and if results are available they will be loaded in the UI.

###  1.5 Tuning a model from a dataset (requires GPUs)

At the moment for successful tuning, you need access to GPUs in your cluster or in your local machine (still work in progress for leveraging Mac GPUs). 

In order to run a fine-tuning task, you need to select the following items:

* tuning task type - what type of learning task are you attempting? segmentation, regression etc. This has already been onboarded in 1.4 above. You can also onboard a new task type if you need to.
* base foundation model - which geospatial foundation model will you use as the starting point for your tuning task?
* fine-tuning dataset - what dataset will you use to train the model for your particular application?



To start, let us onboard the base model weights from which we will fine-tune/customize a model for a downstream task of burn-scars identification. The base model is the foundation model (encoder) which has been pre-trained and has the basic understanding of the data. More information can currently be found on the different models in the documentation.

In [None]:
with open("payloads/backbones/backbone-Prithvi_EO_V2_300M.json","r") as f:
    backbone= json.load(f)
onboard_backbone_response=gfm_client.create_base_model(backbone)
display( json.dumps(onboard_backbone_response,indent=2))

Now that we have a backbone model registered in our studio instance we need to onboard a sample of burscar labelled dataset. Here we are downloading to your cluster this dataset from a presigned link and this process may take a few minutes depending on your network speeds.

In [None]:
with open("payloads/datasets/dataset-burn_scars.json","r") as f:
    wild_fire_dataset= json.load(f)
onboard_dataset_response=gfm_client.onboard_dataset(data=wild_fire_dataset)
display(json.dumps(onboard_dataset_response,indent=2))

You can then monitor the status of the onboarding process through the polling function. Since dowloading and curating this dataset is resource intensive, in an environment with limited CPUs and RAM you might have other functionality of your deployment dropped to allocate more resources to this tasks. For example port-forwarding may be dropped and the polling will fail with a some errors. In that case, give it some time and later attempt to portforward your services again using the port-fowarding commands defined in section 1.2 above and re-run polling. 

> You may also want to monitor the logs for the onboarding job pod that pops up in your cluster.

In [None]:
gfm_client.poll_onboard_dataset_until_finished(onboard_dataset_response["dataset_id"])

Note: Currently, for local deployments with access to non-NVIDIA GPUs (for example in Mac), you will need to run the fine-tuning outside of the local cluster, and the resulting model can be onboarded back to the local cluster for inference. This will be addressed in future, and is not an issue for cluster deployments with accessible GPUs. For this case jump to section [1.5.1 - Tuning a model from a dataset using Mac GPUs](#151-tuning-a-model-from-a-dataset-using-mac-gpus)

#### 1.5.0 Tuning a model from a dataset in a cluster deployments with accessible GPUs

Now that we have a backbone model and a dataset registered in our studio instance we can now trigger a fine tuning job. We will use the Prithvi-EO-V2-300M backbone model and the burn scars dataset we onboarded earlier. We will use the default tuning template for this task. The tuning template is a configuration file that defines the hyperparameters for the fine tuning job. You can find the default tuning template in the payloads/tune_templates directory. You can also create your own tuning template and register it in the studio instance. 

Now we can trigger a fine tuning job, using the payload and script below. 

In [None]:
# Get  the dataset id, base model id and tune template id
dataset_id = onboard_dataset_response['dataset_id']
base_model_id = onboard_backbone_response['id']
tune_template_id = template_response['id']

In [None]:
payload={
  "name": "burn-scars-demo",
  "description": "Segmentation",
  "dataset_id": dataset_id,
  "base_model_id": base_model_id,
  "tune_template_id": tune_template_id,
}

tune_submitted=gfm_client.submit_tune( payload,output='json')
display( json.dumps(tune_submitted,indent=2))

You can then monitor the status of your tune polling function, or alternatively monitor progress and view the dataset factory in the UI. Since tuning is resource intensive, in an environment with limited CPUs and RAM you might have other functionality of your deployment dropped to allocate more resources to this tasks. For example port-forwarding may be dropped and the polling will fail with some errors. In that case, give it some time and later attempt to portforward your services again using the port-fowarding commands defined in section 1.2 above and re-run polling. 

> You may also want to monitor the logs for the `geotune-xxx` job pod that pops up in your cluster. In cases where you have a signle GPU in your cluster, you need to scale down the `terratorch-inference` deployment pods to zero to release the GPU for tuning, otherwise, the geotune pod will remain pending if it lacks a GPU to bind to.

In [None]:
gfm_client.poll_finetuning_until_finished(tune_id=tune_submitted['tune_id'])

After the tune above completes, we can trigger an inference run. This can be run through the SDK as below, where you define the spatial and temporal domain to run the inference.

> Below, we show an expanded payload for submitting the inference to demonstrate how you can override the different configurations.

In [None]:
tune_id = tune_submitted['tune_id']

# Define the inference payload
payload={
  "model_display_name":"geofm-sandbox-models",
  "location":"Red Bluff, California, United States",
  "description":"Park Fire Aug 2024",
  "spatial_domain":{
    "bbox":[
      
    ],
    "urls":[
      "https://geospatial-studio-example-data.s3.us-east.cloud-object-storage.appdomain.cloud/examples-for-inference/park_fire_scaled.tif"
    ],
    "tiles":[
      
    ],
    "polygons":[
      
    ]
  },
  "temporal_domain":[
    "2024-08-12"
  ],
  "pipeline_steps":[
    {
      "status":"READY",
      "process_id":"url-connector",
      "step_number":0
    },
    {
      "status":"WAITING",
      "process_id":"terratorch-inference",
      "step_number":1
    },
    {
      "status":"WAITING",
      "process_id":"postprocess-generic",
      "step_number":2
    },
    {
      "status":"WAITING",
      "process_id":"push-to-geoserver",
      "step_number":3
    }
  ],
  "post_processing":{
    "cloud_masking":"False",
    "ocean_masking":"False",
    "snow_ice_masking":null,
    "permanent_water_masking":"False"
  },
  "model_input_data_spec":[
    {
      "bands":[
        {
          "index":"0",
          "RGB_band":"B",
          "band_name":"Blue",
          "scaling_factor":"0.0001"
        },
        {
          "index":"1",
          "RGB_band":"G",
          "band_name":"Green",
          "scaling_factor":"0.0001"
        },
        {
          "index":"2",
          "RGB_band":"R",
          "band_name":"Red",
          "scaling_factor":"0.0001"
        },
        {
          "index":"3",
          "band_name":"NIR_Narrow",
          "scaling_factor":"0.0001"
        },
        {
          "index":"4",
          "band_name":"SWIR1",
          "scaling_factor":"0.0001"
        },
        {
          "index":"5",
          "band_name":"SWIR2",
          "scaling_factor":"0.0001"
        }
      ],
      "connector":"sentinelhub",
      "collection":"hls_l30",
      "file_suffix":"_merged.tif",
      "modality_tag":"HLS_L30"
    }
  ],
  "geoserver_push":[
    {
      "z_index":0,
      "workspace":"geofm",
      "layer_name":"input_rgb",
      "file_suffix":"",
      "display_name":"Input image (RGB)",
      "filepath_key":"model_input_original_image_rgb",
      "geoserver_style":{
        "rgb":[
          {
            "label":"RedChannel",
            "channel":1,
            "maxValue":255,
            "minValue":0
          },
          {
            "label":"GreenChannel",
            "channel":2,
            "maxValue":255,
            "minValue":0
          },
          {
            "label":"BlueChannel",
            "channel":3,
            "maxValue":255,
            "minValue":0
          }
        ]
      },
      "visible_by_default":"True"
    },
    {
      "z_index":1,
      "workspace":"geofm",
      "layer_name":"pred",
      "file_suffix":"",
      "display_name":"Model prediction",
      "filepath_key":"model_output_image",
      "geoserver_style":{
        "segmentation":[
          {
            "color":"#000000",
            "label":"ignore",
            "opacity":0,
            "quantity":"-1"
          },
          {
            "color":"#000000",
            "label":"no-data",
            "opacity":0,
            "quantity":"0"
          },
          {
            "color":"#ab4f4f",
            "label":"fire-scar",
            "opacity":1,
            "quantity":"1"
          }
        ]
      },
      "visible_by_default":"True"
    }
  ]
}
# Submit the inference request
gfm_client.try_out_tune(tune_id=tune_id, data=payload)


Navigate to the geospatial studio UI and click on the `Start fine-tuning` card. Under `Model & Tunes` identify and click on `geofm-sandbox-models` and under `History` click on the entry with name `Park Fire Aug 2024` i.e. the inference you just run. This loads the inference page with status of your inference and if results are available they will be loaded in the UI.

#### 1.5.1 Tuning a model from a dataset using Mac GPUs
If you have access to Mac GPUs, you can use the following code to tune a model from a dataset. This is useful for testing and development purposes.

TBA!!!