![](https://wherobots.com/wp-content/uploads/2023/12/Inline-Blue_Black_onWhite@3x.png)

# WherobotsDB Basic Examples

In this notebook we will explore some basic getting started functionality with WherobotsDB, including:

* configuring WherobotsDB to access S3 buckets
* loading Shapefile data to Spatial DataFrames
* performing a spatial join using SQL
* visualizing geospatial data
* writing results as GeoParquet

## Configuring WherobotsDB

First, we import Python dependencies and then configure WherobotsDB to access the public `wherobots-examples` AWS S3 bucket using anonymous credentials. You can read more about configuring file access in the [documentation.](https://docs.wherobots.com/latest/references/havasu/configuration/cross-account/?h=s3)

In [None]:
from sedona.spark import *
from pyspark.sql.functions import desc
import os

In [None]:
config = SedonaContext.builder().appName('sedona-example-python')\
    .config('spark.hadoop.fs.s3a.bucket.wherobots-examples.aws.credentials.provider','org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider')\
    .getOrCreate()
sedona = SedonaContext.create(config)
sc = sedona.sparkContext

## Loading Shapefiles

We'll load two Shapefiles into two Spatial DataFrames, then perform a spatial join. WherobotsDB can work with a large variety of spatial file formats including CSV, Shapefile, GeoParquet, GeoJson, and PostGIS. See [the documentation](https://docs.wherobots.com/latest/tutorials/sedonadb/vector-data/vector-load/) for more examples of loading data from different formats.

In [None]:
# Read the countries shapefiles from S3
s3BucketName = 'wherobots-examples'
countries = ShapefileReader.readToGeometryRDD(sc, 's3://%s/data/ne_50m_admin_0_countries_lakes/' % s3BucketName)
# Convert the Spatial RDD to a Spatial DataFrame using the Adapter
countries_df = Adapter.toDf(countries, sedona)
countries_df.createOrReplaceTempView("country")
countries_df.printSchema()

In [None]:
# Read the airports shapefiles from S3
airports = ShapefileReader.readToGeometryRDD(sc, 's3://%s/data/ne_50m_airports/' % s3BucketName)
# Convert the Spatial RDD to a Spatial DataFrame using the Adapter
airports_df = Adapter.toDf(airports, sedona)
airports_df.createOrReplaceTempView("airport")
airports_df.printSchema()

## Spatial Join Query

Now that we've loaded the data, let's perform a spatial join using the [`ST_Contains` spatial predicate function](https://docs.wherobots.com/latest/references/sedonadb/vector-data/Predicate/?h=st_contains#st_contains). We will join countries and airports using airports located within the geometry of each country.

In [None]:
# Run a spatial join query to find airports in each country
result = sedona.sql('SELECT c.geometry as country_geom, c.NAME_EN, a.geometry as airport_geom, a.name FROM country c, airport a WHERE ST_Contains(c.geometry, a.geometry)')
# Aggregate the results to find the number of airports in each country
aggregateResult = result.groupBy('NAME_EN', 'country_geom').count()
aggregateResult.orderBy(desc('count')).show()

## Visualize Results

Next, we will visualize the result of our spatial join operation using [SedonaKepler.](https://docs.wherobots.com/latest/tutorials/sedonadb/vector-data/vector-visualize/?h=sedonakepler#sedonakepler)

In [None]:
# Visualize results using SedonaKepler
result_map = SedonaKepler.create_map(df=aggregateResult, name='Airport_Count')
result_map

## Write Results To GeoParquet

SedonaDB supports writing data to a number of spatial formats. Here we write the results of our analysis using the GeoParquet format. See [the documentation](https://docs.wherobots.com/latest/tutorials/sedonadb/vector-data/vector-save/) for more examples of saving vector data.

In [None]:
# Write the results to a GeoParquet file
aggregateResult.write.format('geoparquet').mode('overwrite').save(os.getenv("USER_S3_PATH") + 'airport_country.parquet')