## Electric Vehicle Charging Station Site Selection Analysis

This notebook demonstrates a workflow for identifying potential areas for new electric vehicle (EV) charging station development using WherobotsDB and WherobotsAI raster inference functionality. The workflow is based on:

* Identifying existing EV charging station infrastructure
* Proximity to retail stores as a proxy for demand, and
* Proximity to solar farms
    

Existing charing station infrastructure and retail store point of interest data is determined using public data sources, while existing solar farm infrastructure is identified using Wherobots AI raster inference. By using a machine learning model trained on satellite imagery we can identify solar farms as an input to the analysis

In [1]:
from sedona.spark import *
import os
import warnings
warnings.filterwarnings('ignore')

# specifies catalog called benchmark, the havasu catalog
# need to get from terminal
config = SedonaContext.builder() \
           .config("spark.hadoop.fs.s3a.bucket.wherobots-examples.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider") \
           .config("spark.driver.maxResultSize", "10g") \
           .config("sedona.join.autoBroadcastJoinThreshold", "-1") \
           .config("spark.sql.catalog.benchmark.type", "hadoop") \
           .config("spark.sql.catalog.benchmark", "org.apache.iceberg.spark.SparkCatalog") \
           .config("spark.sql.catalog.benchmark.warehouse", "s3://wherobots-inference-staging/benchmark/") \
           .config("spark.sql.catalog.benchmark.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") \
           .config("spark.sql.catalog.benchmark.client.arn", f"os.getenviron[AWS_ROLE_ARN]") \
           .config("spark.sql.catalog.benchmark.client.region",  "us-west-2") \
           .config("spark.hadoop.fs.s3a.bucket.benchmark.arn", f"os.getenviron[AWS_ROLE_ARN]") \
           .config("spark.sql.catalog.benchmark.warehouse", "s3://wherobots-inference-staging/benchmark/").getOrCreate()
sedona = SedonaContext.create(config)

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
                                                                                

## Identify Area Of Interest

We will use US Census Zip Code Tabulated Areas (ZCTA) to identify regions for potential EV charging station development. We will confine our analysis to the state of Arizona.

Note that we are using the `ST_Intersects` spatial predicate function to find ZCTAs that intersect with the border of Arizona rather than `ST_Contains`. This is because some ZCTAs extend beyond the border of Arizona and can lie within multiple states. This will extend our analysis slightly beyond the borders of Arizona.

In [2]:
az_zips_df = sedona.sql("""
WITH arizona AS ( 
    SELECT localityArea.geometry AS geometry
    FROM wherobots_open_data.overture_2024_02_15.admins_locality locality 
    JOIN wherobots_open_data.overture_2024_02_15.admins_localityArea localityArea 
    ON locality.id = localityArea.localityId
    WHERE locality.names.primary = "Arizona" AND locality.localityType = "state" 
)

SELECT zta5.geometry AS geometry, ZCTA5CE10 
FROM wherobots_pro_data.us_census.zipcode zta5, arizona
WHERE ST_Intersects(arizona.geometry, zta5.geometry)
""")

In [3]:
az_zips_df.createOrReplaceTempView("az_zta5")

In [4]:
az_zips_df.printSchema()

root
 |-- geometry: geometry (nullable = true)
 |-- ZCTA5CE10: string (nullable = true)



In [12]:
from wherobots import vtiles
from pyspark.sql.functions import lit
zip_tiles_path = os.getenv("USER_S3_PATH") + "us_census_zipcodes.pmtiles"
az_zips_df = az_zips_df.withColumn('layer', lit('Layer 1'))
zip_tiles_df = vtiles.generate(az_zips_df)

                                                                                

In [None]:

vtiles.write_pmtiles(zip_tiles_df, zip_tiles_path, features_df=az_zips_df)

In [None]:
vtiles.show_pmtiles(full_tiles_path)

Next, we will identify existing EV charging infrastructure within each ZCTA as an input to our analysis.

## Existing EV Charging Infrastructure

Using data from [Open Charge Map](https://openchargemap.org/site) we calculate the number of EV charging stations in each ZCTA to give us a sense of existing EV charging infrastructure.


In [None]:
stations_df = sedona.read.format("geoparquet").load("s3://wherobots-examples/data/examples/openchargemap/world.parquet")

In [None]:
stations_df.createOrReplaceTempView("stations")

In [None]:
SedonaPyDeck.create_scatterplot_map(stations_df.sample(0.01), map_provider='mapbox', map_style='dark')

Count of existing EV charging stations per ZCTA.

In [None]:
az_stations_df = sedona.sql("""
SELECT COUNT(*) AS num, any_value(az_zta5.geometry) AS geometry, ZCTA5CE10
FROM stations JOIN az_zta5
WHERE ST_Intersects(az_zta5.geometry, stations.geometry)
GROUP BY ZCTA5CE10 
ORDER BY num DESC
""")

In [None]:
az_stations_df.createOrReplaceTempView("az_stations")

In [None]:
az_stations_df.count()

In [None]:
az_stations_df.show()

In [None]:
az_stations_df.printSchema()

In [None]:
SedonaPyDeck.create_choropleth_map(az_stations_df, plot_col="ZCTA5CE10", map_provider='mapbox', map_style='dark')

## Arizona Retail Stores

Next, we'll use retail stores per ZCTA as a proxy for demand. Using the Overture Maps Foundation public point of interest data set.

In [None]:
sedona.table("wherobots_open_data.overture_2024_02_15.places_place").count()

In [None]:
az_retail_df = sedona.sql("""
SELECT COUNT(*) AS num, any_value(az_zta5.geometry) AS geometry, ZCTA5CE10
FROM wherobots_open_data.overture_2024_02_15.places_place places 
JOIN az_zta5
WHERE ST_Intersects(az_zta5.geometry, places.geometry)
AND places.categories.main = "retail"
GROUP BY ZCTA5CE10 
ORDER BY num DESC
""")

In [None]:
az_retail_df.createOrReplaceTempView("az_retail")

In [None]:
az_retail_df.cache().show(5)

In [None]:
SedonaKepler.create_map(az_retail_df)

## Combining Retail Stores & Existing EV Chargers

Before we apply WherobotsAI raster inference to identify solar farms in the area, we'll use existing EV chargers and retail stores to identify ZCTA with high demand and low existing EV charging infrastructure by computing the ratio of retail stores to EV chargers in each ZCTA.


In [None]:
az_ratio = sedona.sql("""
SELECT 
    coalesce(az_stations.num, 0) / coalesce(az_retail.num, 1) AS ratio, 
    coalesce(az_stations.geometry, az_retail.geometry) AS geometry, 
    coalesce(az_stations.ZCTA5CE10, az_retail.ZCTA5CE10) AS ZCTA5CE10
FROM az_retail FULL OUTER JOIN az_stations
ON az_retail.ZCTA5CE10 = az_stations.ZCTA5CE10
ORDER BY ratio ASC
""")

In [None]:
az_ratio.show()

In [None]:
SedonaKepler.create_map(az_ratio)

ZCTAs with a low "ratio" are potential candidates for additional EV charging stations. The final input to our analysis is proximity to solar farms, which we will identify using WherobotsAI raster inference.

## WherobotsAI Raster Inference

TODO: identify solar farms within ZCTAs, prioritize low "ratio" ZCTAs


In [None]:
solar_model_inputs_df = sedona.table("benchmark.db.solar_satlas_sentinel2_db")

In [None]:
solar_model_inputs_df.show(truncate=False)