# Geospatial ETL with Wherobots and Databricks Unity Catalog

This notebook demonstrates a complete geospatial ETL workflow.

Learn how to read a **Delta table from Databricks Unity Catalog** in Wherobots, transform coordinates into `POINT` geometries, and write the results back to Unity Catalog.

This simple example provides the building blocks for you to perform more complex spatial analysis and processing on your own data.

The exercises in this guide use the `forecast_daily_calendar_imperial` table dataset, which comes pre-loaded in your Databricks workspace.

##  Prerequisites

In order to run this example notebook, you'll need an:

- Existing **Databricks catalog and schema governed** by Unity Catalog.
    - Optionally, you can create a new [catalog](https://docs.databricks.com/aws/en/catalogs/create-catalog) and [schema](https://docs.databricks.com/aws/en/schemas/create-schema) in Databricks. For instructions on doing so, see the official documentation on creating a catalog and creating a schema.

- A **Connection** between Wherobots and your Unity Catalog-governed schema and catalog.
    - For more information on connecting Unity Catalog to your Wherobots Organization, including the necessary Databricks catalog permissions, see [TODO](link).
    - If your Unity Catalog has been successfully connected to Wherobots, you will be able to see it in the [**Wherobots Data Hub**](https://cloud.wherobots.com/data-hub).


> A note on catalog syncing: Wherobots discovers Databricks catalogs only at runtime startup.
> If you created a new Databricks catalog _after_ the Wherobots runtime was started, that catalog won't be visible until you restart the Wherobots runtime.

> To make a catalog that was created after runtime, you must restart the runtime:

> 1. **Save active work:** Ensure any running jobs or SQL sessions are saved.
> 1. **Destroy runtime:** Stop the current Wherobots runtime in [Wherobots Cloud](https://cloud.wherobots.com/).
> 1. **Start a new runtime:** Start the runtime again.

## In Databricks

## Include your Databricks Resources

Update the `YOUR-CATALOG` and `YOUR-SCHEMA` variables (maintaining the backticks around each) in the cell below to point to the resources in your Databricks environment where you have permission to create tables.

Run the following command in a **Databricks SQL editor** to create a new table with the necessary sample data from the built-in Accuweather sample data.

```sql
CREATE OR REPLACE TABLE `YOUR-CATALOG`.`YOUR-SCHEMA`.`forecast_daily_calendar_imperial_wbc_demo`

USING DELTA
AS
SELECT *
FROM `samples`.`accuweather.forecast_daily_calendar_imperial`
LIMIT 10000;
```

Confirm that you table has been successfully created by running the following command. Once you've confirmed, you can delete or comment out this smoke test before proceeding to the next section.

## In a Wherobots Notebook

## Import libraries

In [None]:
from sedona.spark import *
from pyspark.sql.functions import expr, col, when, lit

##  Set up Wherobots notebook variables

Define the variables you'll use throughout this notebook.

In [None]:
CATALOG = "kelly-uc" # Change this to your catalog
SCHEMA  = "weather" # Change this to your schema name
SOURCE_TABLE = "forecast_daily_calendar_imperial_wbc_demo"
OUTPUT_TABLE = "transformed_forecast_daily_calendar_imperial"

SOURCE_TABLE_FQN = f"`{CATALOG}`.`{SCHEMA}`.`{SOURCE_TABLE}`"
OUTPUT_TABLE_FQN = f"`{CATALOG}`.`{SCHEMA}`.`{OUTPUT_TABLE}`"

print("Target UC input Delta table:", SOURCE_TABLE_FQN)
print("Target UC output Delta table:", OUTPUT_TABLE_FQN)

### Confirm that you can read data from the Unity Catalog table in your Wherobots Notebook

Read the table and confirm that it returns a dataframe.

In [None]:
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)

In [None]:
table_smoke_test = sedona.read.table(SOURCE_TABLE_FQN)
table_smoke_test.show(10)

## Create the SedonaContext

The following imports the necessary modules from the Sedona library and creates a `SedonaContext` object.

## Running Spatial Operations

In this step, we will convert the latitude and longitude column into a `Point` object and add it to the table.

This following transforms latitude and longitude data in a DataFrame into a spatially-aware geometry column and then validates the result.

In short, it adds a new column named point by converting latitude and longitude values into a standard geographic point.

## Proximity Analysis: Calculate Distances to Key Locations
In this section, you will perform a proximity analysis to calculate the distance from each weather forecast in your dataset to a specific point of interest. This allows you to filter data based on location and answer questions like, "Which of these weather events is closest to my operations center?"

### A Practical Example
Imagine your business has major operations or supply chain dependencies in the **Tokyo metropolitan area**, where severe weather can disrupt logistics and public safety. Your raw data contains thousands of forecasts across the region but lacks the context of which ones pose a direct threat to the city.

By defining **Tokyo's coordinates**, you can calculate the distance from every weather event to the city center, saving the result in a new column like `distance_to_tokyo_meters`.

With this new column, your data becomes an early-warning system. You can now easily ask critical business questions like:

> "Show me all forecasts with **wind gusts over 60 mph** or **heavy precipitation** within a **500-kilometer radius** of Tokyo."

This analysis turns your spatial data into actionable intelligence, allowing you to focus only on the events that directly impact your operations.

In [None]:
# Load Data and Create Geometry
# Read the table from Unity Catalog and create the necessary geometry column for spatial analysis.

df = sedona.read.table(SOURCE_TABLE_FQN)

In [None]:
# Create a 'point' geometry column from the latitude and longitude columns.

df_w_geom = df.withColumn(
    "point",
    expr("ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)")
)

In [None]:
# Proximity Analysis: Calculate Distance to Tokyo
# This is the core spatial operation. We calculate the distance from every weather
# forecast point to our point of interest, Tokyo.

# Define the point of interest (Tokyo) as a WKT string.

tokyo_geom_wkt = "POINT (139.6917 35.6895)"

# Wherobots efficiently calculates the spherical distance in meters for every row.
df_with_distance = df_w_geom.withColumn(
    "distance_to_tokyo_meters",
    expr(f"ST_DistanceSphere(point, ST_SetSRID(ST_GeomFromWKT('{tokyo_geom_wkt}'), 4326))")
)

print("Calculated distance to Tokyo for each forecast.")
df_with_distance.select("distance_to_tokyo_meters").show(5)


In [None]:
# Define thresholds for our alerts
# 40 is the minimum wind speed that qualifies as a "Severe Wind" by the Beaufort scale.
# 0.30 inches of precipitation per hour in a 24-hour period is considered "Heavy Rain" by the National Weather Service.


proximity_threshold_km = 500.0
severe_wind_mph = 40
heavy_precipitation_in = 0.30

# Use a nested 'when' clause to build a descriptive alert string.
df_with_threats = df_with_distance.withColumn(
    "threat_description",
    when(col("distance_to_tokyo_meters") > proximity_threshold_km, lit("No Threat (Distance exceeds proximity threshold)"))
    .when(
        (col("wind_gust_max") >= "severe_wind_mph") & (col("precipitation_lwe_total") >= heavy_precipitation_in),
        lit("High Wind & Flood Watch Near Tokyo")
    )
    .when(col("wind_gust_max") >= "severe_wind_mph", lit("High Wind Warning Near Tokyo"))
    .when(col("precipitation_lwe_total") >= heavy_precipitation_in, lit("Flood Watch Near Tokyo"))
    .otherwise(lit("Normal Conditions Near Tokyo"))
)

print(" Generated new 'threat_description' feature:")
df_with_threats.select("city_name", "wind_gust_max", "precipitation_lwe_total", "threat_description").show()

## Writing the Results

In this step we will write the results back to an external Delta table managed by Unity Catalog.

> **Note:** Before storing the data in Databricks, we are going to covert the geometry column back int WKT and then drop the `point` column. This is because Databricks does not natively support geometries. Also, keep in mind that when you write the data back to Databricks, your user may not have the necessary permissions to query it; you will need to grant those permissions explicitly in the Unity Catalog. 

In [None]:
final_df = df_with_threats.select(
    col("city_name"),
    col("date"),
    col("temperature_avg"),
    col("wind_gust_max"),
    col("precipitation_lwe_total"),
    col("distance_to_tokyo_meters"),
    col("threat_description") # This is our new, actionable feature!
)


print("\nFinal schema to be written to Unity Catalog:")
final_df.printSchema()

# Write the enriched data back to Unity Catalog

final_df.createOrReplaceTempView("temp_final_df_view")
OUTPUT_TABLE_LOCATION = 's3://databricks-workspace-stack-9216f-bucket/unity-catalog/1030888365966037/test_delta_table/'


# Now, execute the SQL command to create the table from the temporary view.
sedona.sql(f"""
  CREATE OR REPLACE TABLE {OUTPUT_TABLE_FQN}
  USING delta
  LOCATION '{OUTPUT_TABLE_LOCATION}'
  AS SELECT * FROM temp_final_df_view
""")

print(f"\nSuccess! The enriched data is ready to be written to {OUTPUT_TABLE_FQN}")