# Loading Geospatial Data with Wherobots

## 📖 Introduction
In this notebook, we will demonstrate how to load geospatial data into Wherobots using the following formats:

1. **GeoParquet**
2. **GeoJSON and Shapefiles**
3. **Raster Data (GeoTIFF)**
4. **Overture Maps Data**
5. **Data from S3**

Each section will walk through the necessary steps with annotated code and provide links to relevant Wherobots documentation.


## 🗂 Step 1: Loading GeoParquet Files

### What you'll learn:
- How to load GeoParquet files into a DataFrame.
- Perform basic spatial queries.

In [31]:
# Import necessary libraries
from sedona.sql.st_predicates import ST_Intersects
from sedona.sql.st_constructors import ST_GeomFromText
from sedona.spark import SedonaContext
from pyspark.sql import SparkSession

In [2]:
# Initialize Sedona and Spark session
config = SparkSession.builder \
    .appName("Dataset Loader") \
    .getOrCreate()
sedona = SedonaContext.create(config)

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
                                                                                

In [3]:
# Load GeoParquet data
gdf = sedona.read.format("geoparquet").load("s3://wherobots-examples/data/mini/es_cn.parquet")

                                                                                

In [4]:
gdf.printSchema()

root
 |-- id: string (nullable = true)
 |-- geometry: geometry (nullable = true)
 |-- determination_datetime: timestamp (nullable = true)
 |-- admin_island: string (nullable = true)
 |-- crop:code: string (nullable = true)
 |-- crop:name: string (nullable = true)
 |-- area: float (nullable = true)
 |-- admin:country_code: string (nullable = true)
 |-- admin:subdivision_code: string (nullable = true)
 |-- crop:code_list: string (nullable = true)
 |-- bbox: struct (nullable = true)
 |    |-- xmin: double (nullable = true)
 |    |-- ymin: double (nullable = true)
 |    |-- xmax: double (nullable = true)
 |    |-- ymax: double (nullable = true)



📄 **Documentation Reference**: [Loading GeoParquet](https://docs.wherobots.com/#geoparquet-loading)  

## 🌍 Step 2: Loading GeoJSON and Shapefiles

### What you'll learn:
- How to ingest GeoJSON and Shapefiles.

In [5]:
# Load GeoJSON file
geojson_df = sedona.read.format("geojson").load("s3://wherobots-examples/data/mini/2015_Tree_Census.geojson")

                                                                                

In [6]:
geojson_df.printSchema()

root
 |-- _corrupt_record: string (nullable = true)
 |-- geometry: geometry (nullable = true)
 |-- properties: struct (nullable = true)
 |    |-- address: string (nullable = true)
 |    |-- block_id: string (nullable = true)
 |    |-- boro_ct: string (nullable = true)
 |    |-- borocode: string (nullable = true)
 |    |-- boroname: string (nullable = true)
 |    |-- brnch_ligh: string (nullable = true)
 |    |-- brnch_othe: string (nullable = true)
 |    |-- brnch_shoe: string (nullable = true)
 |    |-- cb_num: string (nullable = true)
 |    |-- cncldist: string (nullable = true)
 |    |-- created_at: string (nullable = true)
 |    |-- curb_loc: string (nullable = true)
 |    |-- guards: string (nullable = true)
 |    |-- health: string (nullable = true)
 |    |-- latitude: string (nullable = true)
 |    |-- longitude: string (nullable = true)
 |    |-- nta: string (nullable = true)
 |    |-- nta_name: string (nullable = true)
 |    |-- problems: string (nullable = true)
 |    |-- roo

In [7]:
import pyspark.sql.functions as f

df = sedona.read.format("geojson").load("s3://wherobots-examples/data/mini/2015_Tree_Census.geojson") \
    .withColumn("address", f.expr("properties['address']")) \
    .withColumn("spc_common", f.expr("properties['spc_common']")) \
    .drop("properties").drop("type")

df.printSchema()



root
 |-- _corrupt_record: string (nullable = true)
 |-- geometry: geometry (nullable = true)
 |-- address: string (nullable = true)
 |-- spc_common: string (nullable = true)



                                                                                

In [8]:
# Load Shapefile
shapefile_df = sedona.read.format("shapefile").load("s3://wherobots-examples/data/mini/HurricaneSandy/geo_export_2ca210ed-d8b2-4fe6-81eb-53cc96311073.shp")

In [9]:
# Inspect and perform a query
shapefile_df.printSchema()

root
 |-- geometry: geometry (nullable = true)
 |-- comments: string (nullable = true)
 |-- state: string (nullable = true)
 |-- demsource: string (nullable = true)
 |-- id: decimal(33,31) (nullable = true)
 |-- status: string (nullable = true)
 |-- sourcedata: string (nullable = true)
 |-- verified: string (nullable = true)



📄 **Documentation Reference**: [Ingesting GeoJSON](https://docs.wherobots.com/#geojson-loading)  

## 🖼️ Step 3: Loading Raster Data (GeoTIFF)

### What you'll learn:
- How to load raster datasets and inspect metadata.


In [10]:
# Load a GeoTIFF raster file
raster_df = sedona.read.format("binaryFile").load("s3://wherobots-examples/data/mini/NYC_3ft_Landcover.tif")

In [11]:
# Convert binary content to a raster object
raster_df = raster_df.selectExpr("RS_FromGeoTiff(content) as raster")

📄 **Documentation Reference**: [Loading Raster Data](https://docs.wherobots.com/#raster-loading)  

## 🗺️ Step 4: Loading Overture Maps Data

### What you'll learn:
- Load and query datasets provided by Overture Maps.


In [15]:
# Load Overture Maps building dataset
buildings_df = sedona.read.format("iceberg").load("wherobots_open_data.overture.buildings_building")

In [34]:
# Filter based on geometry (example: within a bounding box)
bbox_wkt = '''POLYGON((-122.5 37.0, -122.5 37.5, -121.5 37.5, -121.5 37.0, -122.5 37.0))'''
buildings_filtered = buildings_df.where(ST_Intersects("geometry", f.expr(f'''ST_GeomFromText('{bbox_wkt}')''')))

In [35]:
# Show results
buildings_filtered.show()

[Stage 8:>                                                          (0 + 1) / 1]

+--------------------+--------------------+-------+-----+-----+------+---------+-----------+--------------------+--------------------+--------------------+-------+
|                  id|          updatetime|version|names|level|height|numfloors|      class|             sources|                bbox|            geometry|geohash|
+--------------------+--------------------+-------+-----+-----+------+---------+-----------+--------------------+--------------------+--------------------+-------+
|tmp_7733393231353...|2016-01-18T20:36:...|      0|   {}| NULL|   2.4|     NULL|       NULL|[{dataset -> USGS...|{-122.1369756, -1...|POLYGON ((-122.13...|     9q|
|tmp_7731303530363...|2022-04-10T04:52:...|      0|   {}| NULL|  NULL|     NULL|residential|[{dataset -> Open...|{-122.1396009, -1...|POLYGON ((-122.13...|     9q|
|tmp_7733373936373...|2021-10-02T19:44:...|      0|   {}| NULL|   3.6|     NULL|       NULL|[{dataset -> Open...|{-121.8066009, -1...|POLYGON ((-121.80...|     9q|
|tmp_77333832383

                                                                                

## 🔮 Next Steps

In this notebook, we demonstrated how to:

1. Load GeoParquet, GeoJSON, Shapefiles, and raster data into Wherobots.
2. Query spatial data using basic spatial operations.
3. Integrate datasets directly from S3 and Overture Maps.

### What’s next?
- Explore **spatial transformations** like buffering or intersecting geometries.
- Perform **spatial joins** for more advanced analytics.
- Visualize query results with **SedonaKepler** or **SedonaPyDeck**.

For further details, check out the [Wherobots Documentation](https://docs.wherobots.com).
