![](https://wherobots.com/wp-content/uploads/2023/12/Inline-Blue_Black_onWhite@3x.png)

# WherobotsDB Example Notebook - Scala

This notebook demonstrates loading Shapefile data, performing a spatial join operation and writing the results as GeoParquet. 

First, we import Python dependencies and then configure WherobotsDB to access the public `wherobots-examples` AWS S3 bucket. You can read more about configuring file access in the [documentation.](https://docs.wherobots.com/latest/references/havasu/configuration/cross-account/?h=s3)

In [None]:
%%init_spark

In [None]:
import org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader
import org.apache.sedona.spark.SedonaContext
import org.apache.sedona.sql.utils.Adapter
import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.functions.desc

In [None]:
val sedona = SedonaContext.create(spark)
val sc = sedona.sparkContext

In [None]:
// Read the countries shapefiles from S3
val s3BucketName = "wherobots-examples"
val countries = ShapefileReader.readToGeometryRDD(sc, s"s3://$s3BucketName/data/ne_50m_admin_0_countries_lakes/")
// Convert the Spatial RDD to a Spatial DataFrame using the Adapter
val countries_df = Adapter.toDf(countries, sedona)
countries_df.createOrReplaceTempView("country")
countries_df.printSchema()

// countries_df.write.format("havasu.iceberg").saveAsTable("my_catalog.test_db.country")

In [None]:
// Read the airports shapefiles from S3
val airports = ShapefileReader.readToGeometryRDD(sc, s"s3://$s3BucketName/data/ne_50m_airports/")
// Convert the Spatial RDD to a Spatial DataFrame using the Adapter
val airports_df = Adapter.toDf(airports, sedona)
airports_df.createOrReplaceTempView("airport")
airports_df.printSchema()

In [None]:
// Run a spatial join query to find airports in each country
val result = sedona.sql("SELECT c.geometry as country_geom, c.NAME_EN, a.geometry as airport_geom, a.name FROM country c, airport a WHERE ST_Contains(c.geometry, a.geometry)")
// Aggregate the results to find the number of airports in each country
val aggregateResult = result.groupBy("NAME_EN", "country_geom").count()
aggregateResult.orderBy(desc("count")).show()

In [None]:
// Write the results to a GeoParquet file
aggregateResult.write.format("geoparquet").mode(SaveMode.Overwrite).save(sys.env("USER_S3_PATH") + "airport_country.parquet")