## Reverse Geocoding with Sagemaker

**Reverse geocoding** is the process of converting a location as described by geographic coordinates (**latitude*, **longitude**) to a human-readable address or place name. It is the opposite of **forward geocoding** (often referred to as **address geocoding** or simply "geocoding"), hence the term reverse. Reverse geocoding permits the identification of nearby street addresses, places, and/or areal subdivisions such as neighbourhoods, county, state, or country.

In [19]:
import time
from datetime import datetime

import boto3
import pandas as pd
import sagemaker
import sagemaker_geospatial_map

today = datetime.now().strftime("%Y-%m-%d-%H:%M:%S")
today

'2023-05-07-11:22:38'

## S3

In [36]:
s3_bucket = "yang-ml-sagemaker"
s3_key = "reverse-geocoding"
input_object_key = f"s3://{s3_bucket}/{s3_key}/housing.csv"
output_object_key = f"s3://{s3_bucket}/{s3_key}/output/"

s3 = boto3.client("s3")

## Sagemaker

In [None]:
boto3_session = boto3.Session()
role = sagemaker.get_execution_role()
geospatial_client = boto3_session.client(service_name="sagemaker-geospatial")

## Vector Enrichment Job

The job requires that the csv file is uploaded to S3. The "longitude" and "latitude" headers of the CSV file are used as inputs for the reverse geocoding implementation. Further documentations can be found [here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-geospatial/client/start_vector_enrichment_job.html#).

Currently, reverse geocoding jobs only support a maximum of 15000 entries in the input csv.

In [25]:
# Job configuration
job_config = {
    # The input structure of the csv headers for Reverse Geocoding operation type
    "ReverseGeocodingConfig": {
        "XAttributeName": "longitude",
        "YAttributeName": "latitude",
    },
}

# Input configuration information for the Vector Enrichment job
input_config = {
    "DataSourceConfig": {"S3Data": {"S3Uri": input_object_key}},
    "DocumentType": "CSV",
}

# Create VEJ
response = geospatial_client.start_vector_enrichment_job(
    Name=f"reverse-geocoding-{today}",
    ExecutionRoleArn=role,
    InputConfig=input_config,
    JobConfig=job_config,
)

# Obtain the Amazon Resource Name (ARN) of the Vector Enrichment job
vej_arn = response["Arn"]
vej_arn

'arn:aws:sagemaker-geospatial:us-west-2:722696965592:vector-enrichment-job/2rqtgidf4azn'

Check status of created vector enrichment job:

In [26]:
job_completed = False
while not job_completed:
    response = geospatial_client.get_vector_enrichment_job(Arn=vej_arn)
    print(
        "Job status: {} (Last update: {})".format(response["Status"], datetime.now()),
        end="\r",
    )
    job_completed = True if response["Status"] == "COMPLETED" else False
    if not job_completed:
        time.sleep(30)

Job status: COMPLETED (Last update: 2023-05-07 11:43:22.612972)5))

## Export Vector Enrichment Job Output to S3

The following output columns will be exported:

* reverse_geo.address_number
* reverse_geo.country
* reverse_geo.label
* reverse_geo.municipality
* reverse_geo.neighborhood
* reverse_geo.postal_code
* reverse_geo.region
* reverse_geo.status

In [30]:
response = geospatial_client.export_vector_enrichment_job(
    Arn=vej_arn,
    ExecutionRoleArn=role,
    OutputConfig={"S3Data": {"S3Uri": output_object_key}},
)

In [29]:
while not response["ExportStatus"] == "SUCCEEDED":
    response = geospatial_client.get_vector_enrichment_job(Arn=vej_arn)
    print(
        "Export status: {} (Last update: {})".format(
            response["ExportStatus"], datetime.now()
        ),
        end="\r",
    )
    if not response["ExportStatus"] == "SUCCEEDED":
        time.sleep(15)

Export status: SUCCEEDED (Last update: 2023-05-07 11:46:24.090958)8)

## Visualize Enriched Data

In [61]:
s3_bucket_objects = s3.list_objects_v2(Bucket=s3_bucket, Prefix=f"{s3_key}/output/")[
    "Contents"
]

for s3_object in s3_bucket_objects:
    # If any of the objects in the 'output' directory ends with 'csv', read it in as a dataframe
    if s3_object["Key"].endswith(".csv"):
        response = s3.get_object(Bucket=s3_bucket, Key=s3_object["Key"])
        df = pd.read_csv(response["Body"])

df.head(5)

Unnamed: 0,longitude,latitude,reverse_geo.address_number,reverse_geo.country,reverse_geo.label,reverse_geo.municipality,reverse_geo.neighborhood,reverse_geo.postal_code,reverse_geo.region,reverse_geo.status
0,-122.23,37.88,,USA,"Grizzly Peak Blvd, Berkeley, CA, 94720, USA",Berkeley,,94720,California,Valid Data
1,-122.22,37.86,2046.0,USA,"2000-2108 Tunnel Rd, Oakland, CA, 94611, USA",Oakland,Merriwood,94611,California,Valid Data
2,-122.24,37.85,,USA,"Exit 4B/Broadway/W, CA-24 W, Oakland, CA, 9461...",Oakland,Upper Rockridge,94618,California,Valid Data
3,-122.25,37.85,6365.0,USA,"6365 Florio St, Oakland, CA, 94618, USA",Oakland,,94618 1335,California,Valid Data
4,-122.25,37.85,6365.0,USA,"6365 Florio St, Oakland, CA, 94618, USA",Oakland,,94618 1335,California,Valid Data


Render embedded map:

In [57]:
embedded_map = sagemaker_geospatial_map.create_map({"is_raster": True})
embedded_map.set_sagemaker_geospatial_client(geospatial_client)

In [None]:
embedded_map.render()

Add output data to visualization:

In [63]:
# Return series and remove from frame in place
column_to_move = df.pop("reverse_geo.label")

# Insert series back as the third column
df.insert(2, "reverse_geo.label", column_to_move)

dataset_links = embedded_map.add_dataset(
    {"data": df, "label": "vej_output"}, auto_create_layers=True
)