<img src="https://wherobots.com/wp-content/uploads/2023/12/Inline-Blue_Black_onWhite@3x.png" alt="Wherobots Logo" width="600">

## Electric Vehicle Charging Station Site Selection Analysis

This notebook demonstrates a workflow for identifying potential areas for new electric vehicle (EV) charging station development using WherobotsDB and WherobotsAI raster inference functionality. The workflow is based on:

* Identifying existing EV charging station infrastructure
* Proximity to retail stores as a proxy for demand, and
* Proximity to solar farms
    

Existing charging station infrastructure and retail store point of interest data is determined using public data sources, while existing solar farm infrastructure is identified using Wherobots AI raster inference. By using a machine learning model trained on satellite imagery we can identify solar farms as an input to the analysis

In [None]:
from sedona.spark import *
import os
import warnings
warnings.filterwarnings('ignore')

# specifies catalog called benchmark, the havasu catalog
# need to get from terminal
config = SedonaContext.builder() \
           .config("spark.hadoop.fs.s3a.bucket.wherobots-examples.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider") \
            .config("spark.driver.maxResultSize", "10g") \
           .config("spark.sql.catalog.benchmark.type", "hadoop") \
           .config("spark.sql.catalog.benchmark", "org.apache.iceberg.spark.SparkCatalog") \
           .config("spark.sql.catalog.benchmark.warehouse", "s3://wherobots-benchmark-prod/data/ml/") \
           .config("spark.sql.catalog.benchmark.io-impl", "org.apache.iceberg.aws.s3.S3FileIO").getOrCreate()
sedona = SedonaContext.create(config)

## Identify Area Of Interest

We will use US Census Zip Code Tabulated Areas (ZCTA) to identify regions for potential EV charging station development. We will confine our analysis to the state of Arizona.

Note that we are using the `ST_Intersects` spatial predicate function to find ZCTAs that intersect with the border of Arizona rather than `ST_Contains`. This is because some ZCTAs extend beyond the border of Arizona and can lie within multiple states. This will extend our analysis slightly beyond the borders of Arizona.

In [None]:
az_zips_df = sedona.sql("""
WITH arizona AS ( 
    SELECT localityArea.geometry AS geometry
    FROM wherobots_open_data.overture_2024_02_15.admins_locality locality 
    JOIN wherobots_open_data.overture_2024_02_15.admins_localityArea localityArea 
    ON locality.id = localityArea.localityId
    WHERE locality.names.primary = "Arizona" AND locality.localityType = "state" 
)

SELECT ST_Intersection(arizona.geometry, zta5.geometry) AS geometry, ZCTA5CE10 
FROM wherobots_pro_data.us_census.zipcode zta5, arizona
WHERE ST_Intersects(arizona.geometry, zta5.geometry)
""")

In [None]:
az_zips_df.createOrReplaceTempView("az_zta5")

In [None]:
az_zips_df.printSchema()

In [None]:
SedonaKepler.create_map(az_zips_df, name="Arizona ZCTAs")

Next, we will identify existing EV charging infrastructure within each ZCTA as an input to our analysis.

## Existing EV Charging Infrastructure

Using data from [Open Charge Map](https://openchargemap.org/site) we calculate the number of EV charging stations in each ZCTA to give us a sense of existing EV charging infrastructure.


In [None]:
stations_df = sedona.read.format("geoparquet").load("s3://wherobots-examples/data/examples/openchargemap/world.parquet")

In [None]:
stations_df.createOrReplaceTempView("stations")

In [None]:
SedonaKepler.create_map(stations_df.sample(0.01), name="EV Charging Stations")

Count of existing EV charging stations per ZCTA.

In [None]:
az_stations_df = sedona.sql("""
SELECT COUNT(*) AS num, any_value(az_zta5.geometry) AS geometry, ZCTA5CE10
FROM stations JOIN az_zta5
WHERE ST_Intersects(az_zta5.geometry, stations.geometry)
GROUP BY ZCTA5CE10 
ORDER BY num DESC
""")

In [None]:
az_stations_df.createOrReplaceTempView("az_stations")

In [None]:
az_stations_df.count()

In [None]:
az_stations_df.cache().show()

In [None]:
az_stations_df.printSchema()

In [None]:
SedonaKepler.create_map(az_stations_df)

## Arizona Retail Stores

Next, we'll use retail stores per ZCTA as a proxy for demand. Using the Overture Maps Foundation public point of interest data set.

In [None]:
sedona.table("wherobots_open_data.overture_2024_02_15.places_place").count()

In [None]:
az_retail_df = sedona.sql("""
SELECT COUNT(*) AS num, any_value(az_zta5.geometry) AS geometry, ZCTA5CE10
FROM wherobots_open_data.overture_2024_02_15.places_place places 
JOIN az_zta5
WHERE ST_Intersects(az_zta5.geometry, places.geometry)
AND places.categories.main = "retail"
GROUP BY ZCTA5CE10 
ORDER BY num DESC
""")

In [None]:
az_retail_df.createOrReplaceTempView("az_retail")

In [None]:
az_retail_df.cache().show(5)

In [None]:
SedonaKepler.create_map(az_retail_df)

## Combining Retail Stores & Existing EV Chargers

Before we apply WherobotsAI raster inference to identify solar farms in the area, we'll use existing EV chargers and retail stores to identify ZCTA with high demand and low existing EV charging infrastructure by computing the ratio of retail stores to EV chargers in each ZCTA.


In [None]:
az_ratio = sedona.sql("""
SELECT 
    coalesce(az_stations.num, 0) / coalesce(az_retail.num, 1) AS ratio, 
    coalesce(az_stations.geometry, az_retail.geometry) AS geometry, 
    coalesce(az_stations.ZCTA5CE10, az_retail.ZCTA5CE10) AS ZCTA5CE10
FROM az_retail FULL OUTER JOIN az_stations
ON az_retail.ZCTA5CE10 = az_stations.ZCTA5CE10
WHERE az_retail.num > 1
ORDER BY ratio DESC
""")

In [None]:
az_ratio.createOrReplaceTempView("az_ratio")

In [None]:
az_ratio.cache().show()

In [None]:
SedonaKepler.create_map(az_ratio)

ZCTAs with a low "ratio" are potential candidates for additional EV charging stations. The final input to our analysis is proximity to solar farms, which we will identify using WherobotsAI raster inference.

## WherobotsAI Raster Inference

The [outdb raster table](https://docs.wherobots.com/1.2.2/references/havasu/raster/out-db-rasters/) refers to Sentinel-2 images with low cloud cover during 2023 in Arizona. We've prepared this using WherbotsDB's raster processing capabilities.

TODO: identify solar farms within ZCTAs, prioritize low "ratio" ZCTAs


In [None]:
columns_to_drop = ["x", "y", "product_type", "length"]
num_partitions = 32
solar_model_inputs_df = sedona.table("benchmark.db.solar_satlas_sentinel2_db").drop(*columns_to_drop).repartition(num_partitions)

In [None]:
solar_model_inputs_df.cache().show()

In [None]:
az_high_demand_with_scene_geom = sedona.sql(""" 
    WITH base as (
        SELECT s.filename, s.geometry as scene_geometry, s.outdb_raster as
        outdb_raster, z.ZCTA5CE10 as zip_code_name, z.geometry as zip_geometry, z.ratio as ratio
        FROM benchmark.db.solar_satlas_sentinel2_db s, az_ratio z
        WHERE ST_Intersects(s.geometry, z.geometry)
        AND z.ratio < 1 AND start_datetime > 20231001
    )
    SELECT DISTINCT filename, outdb_raster from base""").repartition(num_partitions)

In [None]:
%%time
print(az_high_demand_with_scene_geom.cache().count())

In [None]:
az_high_demand_with_scene_geom.createOrReplaceTempView("az_high_demand_with_scene")

In [None]:
model_id = 'solar-satlas-sentinel2'

sedona.sql(f"""
CREATE OR REPLACE TEMP VIEW segment_fields AS (
    SELECT
        outdb_raster, 
        RS_SEGMENT('{model_id}', outdb_raster) AS segment_result
    FROM
    az_high_demand_with_scene
)
""")

In [None]:
predictions_df = sedona.sql(f"""
SELECT
  outdb_raster, segment_result.*
FROM segment_fields
""")

In [None]:
%%time
print(predictions_df.cache().count())

In [None]:
predictions_df.show()
predictions_df.createOrReplaceTempView("predictions_df")

In [None]:
predictions_polys_df = sedona.sql("""
    WITH t AS (
        SELECT RS_SEGMENT_TO_GEOMS(outdb_raster, confidence_array, array(1), class_map, 0.65) result
        FROM predictions_df
    )
    SELECT result.* FROM t
""")

In [None]:
#df_multipolys.show()

In [None]:
predictions_polys_df.createOrReplaceTempView("predictions_polys")

predictions_polys_df = sedona.sql("""
    SELECT
        class_name[0] AS class, average_pixel_confidence_score[0] AS avg_confidence_score, ST_SetSRID(ST_Collect(geometry), 4326) AS geometry
    FROM
        predictions_polys
""").filter("ST_IsEmpty(geometry) = False")

In [None]:
predictions_polys_df.cache().count()

In [None]:
predictions_polys_df.createOrReplaceTempView("predictions_polys")

In [None]:
predictions_polys_df.show()

In [None]:
SedonaKepler.create_map(predictions_polys_df, name="Detected Solar Farms")

## Compute Final Suitability Score

In [None]:
az_solar_zip_codes = sedona.sql("""
SELECT ST_AreaSpheroid(ST_Union_Aggr(ST_SetSRID(predictions_polys.geometry, 4326))) / 1000000 * 247.10559991919519 AS solar_area, any_value(az_zta5.geometry) AS geometry, ZCTA5CE10
FROM predictions_polys JOIN az_zta5
WHERE ST_Intersects(az_zta5.geometry, predictions_polys.geometry)
GROUP BY ZCTA5CE10 
ORDER BY solar_area DESC
""")

In [None]:
az_solar_zip_codes.show()

In [None]:
az_solar_zip_codes.count()

In [None]:
az_solar_zip_codes.createOrReplaceTempView("az_solar_zip_codes")

In [None]:
SedonaKepler.create_map(az_solar_zip_codes)

In [None]:
sedona.sql("""

SELECT MIN(ratio), MAX(ratio) 
FROM az_ratio
""").show()

#(max - min )/ max

In [None]:
# join az_ratio with az_solar_zip codes

final_az_scores = sedona.sql("""
WITH min_max AS (
  SELECT MIN(ratio) AS ratio_min, MAX(ratio) AS ratio_max
  FROM az_ratio
)


SELECT 
    (solar_area / (ST_AreaSpheroid(az_solar_zip_codes.geometry)/ 1000000 * 247.10559991919519)) + (1 - ( (ratio - 0.0 ) / (14.0 - 0)   )) AS score, 
    (1 - (ratio / 14)) AS temp_score,
    ratio,
    az_solar_zip_codes.ZCTA5CE10 AS ZCTA5CE10,
    az_solar_zip_codes.geometry AS geometry
FROM az_ratio
JOIN az_solar_zip_codes
WHERE az_ratio.ZCTA5CE10 = az_solar_zip_codes.ZCTA5CE10
ORDER BY score DESC
""")

In [None]:
final_az_scores.cache().show()

In [None]:
# join az_ratio with az_solar_zip codes

final_az_scores = sedona.sql("""
WITH min_max AS (
  SELECT MIN(ratio) AS ratio_min, MAX(ratio) AS ratio_max
  FROM az_ratio
)


SELECT 
    (solar_area / (ST_AreaSpheroid(az_solar_zip_codes.geometry)/ 1000000 * 247.10559991919519)) + (1 - ( (ratio - 0.0 ) / (14.0 - 0)   )) AS score, 
    az_solar_zip_codes.ZCTA5CE10 AS ZCTA5CE10,
    az_solar_zip_codes.geometry AS geometry
FROM az_ratio
JOIN az_solar_zip_codes
WHERE az_ratio.ZCTA5CE10 = az_solar_zip_codes.ZCTA5CE10
ORDER BY score DESC
""")

In [None]:
final_az_scores.createOrReplaceTempView("final_scores")

## Generate Final Inputs

In [None]:
# Find ev chargers in priority areas

final_az_chargers = sedona.sql("""
SELECT stations.geometry, stations.id AS station_id
FROM stations, final_scores
WHERE ST_Contains(final_scores.geometry, stations.geometry)
""")

final_az_chargers.cache().count()

In [None]:
# Find retail stores in priority areas

final_retail = sedona.sql("""
SELECT places.geometry 
FROM wherobots_open_data.overture_2024_02_15.places_place places
JOIN final_scores
WHERE ST_Contains(final_scores.geometry, places.geometry) AND places.categories.main = "retail"
""")

final_retail.cache().count()


In [None]:
# Find solar farms in priority areas

final_solar = sedona.sql("""
SELECT predictions_polys.class, predictions_polys.avg_confidence_score, predictions_polys.geometry
FROM predictions_polys
JOIN final_scores
WHERE ST_Intersects(final_scores.geometry, predictions_polys.geometry)
""")

final_solar.cache().count()


## Final Suitability Analysis

In [None]:
final_map = SedonaKepler.create_map(final_az_scores, name="Suitability Results")
SedonaKepler.add_df(final_map, final_solar, name="Solar Farms")
SedonaKepler.add_df(final_map, final_retail, name="Retail Stores")
SedonaKepler.add_df(final_map, final_az_chargers, name="Existing EV Chargers")
final_map

## Write Analysis Results As PMTiles



In [None]:
from wherobots import vtiles
from pyspark.sql.functions import lit


# Define paths to save PMTiles files in S3
zip_tiles_path = os.getenv("USER_S3_PATH") + "final_az_suitability.pmtiles"
ev_chargers_path = os.getenv("USER_S3_PATH") + "final_az_chargers.pmtiles"
retail_path = os.getenv("USER_S3_PATH") + "final_retail.pmtiles"
solar_path = os.getenv("USER_S3_PATH") + "final_solar.pmtiles"

# Add "layer" column
final_az_scores_layers = final_az_scores.withColumn("layer", lit('Suitability Results'))
final_az_chargers_layers = final_az_chargers.withColumn("layer", lit('Existing Chargers'))
final_az_retail_layers = final_retail.withColumn("layer", lit("Retail stores"))
final_az_solar_layers = final_solar.withColumn("layer", lit("Solar Farms"))

# Generate and write PMTiles
tiles_df = vtiles.generate(final_az_scores_layers)
vtiles.write_pmtiles(tiles_df, zip_tiles_path, features_df=final_az_scores_layers)

charger_tiles_df = vtiles.generate(final_az_chargers_layers)
vtiles.write_pmtiles(charger_tiles_df, ev_chargers_path, features_df=final_az_chargers_layers)

retail_tiles_df = vtiles.generate(final_az_retail_layers)
vtiles.write_pmtiles(retail_tiles_df, retail_path, features_df=final_az_retail_layers)

solar_tiles_df = vtiles.generate(final_az_solar_layers)
vtiles.write_pmtiles(solar_tiles_df, solar_path, features_df=final_az_solar_layers)


## Visualize PMTiles

In [None]:
tiles_config = [
    {
        "s3_uri": zip_tiles_path,
        "name": "Suitability",
        "style": {'version': 8,
         'sources': {'source': {'type': 'vector',
           'url': 'pmtiles://' + get_signed_url(zip_tiles_path),
           'attribution': 'PMTiles'}},
         'layers': [
          {'id': 'Suitability Results_fill',
           'source': 'source',
           'source-layer': 'Suitability Results',
           'type': 'fill',
           'paint': {'fill-color': 'lightblue', 'fill-opacity': 0.5},
           'filter': ['==', ['geometry-type'], 'Polygon']}]}
            },
    {
        "s3_uri": ev_chargers_path,
        "style": {'version': 8,
             'sources': {'source': {'type': 'vector',
               'url': 'pmtiles://' + get_signed_url(ev_chargers_path),
               'attribution': 'PMTiles'}},
             'layers': [{'id': 'Existing Chargers_point',
               'source': 'source',
               'source-layer': 'Existing Chargers',
               'type': 'circle',
               'paint': {'circle-color': 'blue', 'circle-radius': 5},
               'filter': ['==', ['geometry-type'], 'Point']}]}
                },
    {
        "s3_uri": retail_path
    },
    {
        "s3_uri": solar_path
    }
]

vtiles.show_pmtiles(tiles_config)

## Writing to Iceberg Tables

We can write the results of our analysis to Iceberg tables using SQL, which can then be accessed by other users or analytics applications, including via the Spatial SQL API using one of the WherobotsDB language drivers (Python or JDBC).

In [None]:
sedona.sql("CREATE NAMESPACE IF NOT EXISTS wherobots.suitability")
sedona.sql("DROP TABLE IF EXISTS wherobots.suitability.ev_chargers")
final_az_scores.writeTo("wherobots.suitability.ev_chargers").create()

In [None]:
sedona.table("wherobots.suitability.ev_chargers").show()