# 10. Apply Model to Survey Area GSV Images

For a given locality, and OpenStreetMap XML extracts for it, find all intersections, and create a CSV batch file listing all the points we will sample.  After checking the scope -- how many images will be downloaded? -- go ahead and download any GSV images not already cached.  Then apply the model to detect bicycle lanes, and record detections in a "detection_log" CSV file.  Then, match the detections to the OpenStreetMap data, and create geojson files to compare the data and draw routes on a map.

We walk down each "way" in the OSM data and work out a heading  at each point, based on the average of the bearing from the previous point, and the bearing to the next point, in order to have some idea which heading to use in a Google Street View request to sample images (roughly) forward/backward/left/right.

The items we are looking for might not be most visible from right in the middle of an intersection, therefore there is an option to specify a range around each intersection that we want to sample, along the heading.  E.g. if we specify 20m, then we will sample the point in the middle of the intersection, +/- 10m, and +/- 20m.  We use 10m intervals within this range because Google Street View typically gives a different image roughly every 10m.

In OSM, a long street may be divided up into multiple connecting "ways", each with the same name.  Rather than walking down ways in random order, we attempt process ways in order of their name, and then within the name we attempt to identify a logical order:  Start with a way whose first node is NOT an intersection, then find the next way with the same name that intersects with the end of the first, and so on.  This isn't really necessary to generate a map of all the detections, but it is useful when visually inspecting the results to assess or measure the quality of the results.  It is less disorientating for a human to see the images in a logical "walking" order.

Once we have a list of points to sample, we output a batch "csv" containing the way id, the node id of the sample point, the offset in metres (e.g. "-20"), the latitude, longitude, and bearing.  This can then be used to download and cache Google Street View images, and process them with our detection model.

## Configuration

Any configuration that is required to run this notebook can be customized in the next cell

In [1]:
# Which "locality" do we wish to process?
# Assumes that we can find a pair of OSM files with corresponding names,
# extracted with the "osmium" tool.  One file follows the official shape of
# the locality, while a second file follows a bounding box around the locality
# with a 200m margin, so that when we are looking for intersections, we don't
# miss any due to the intersecting road being just outside the boundary of the
# locality (apart from the intersection).
#locality = 'Mount Eliza'
locality = 'Mount Eliza Sample'
#locality = 'Heidelberg Sample'

# The locality 'Mount Eliza' was extracted from OpenStreetMap according to the
# official geographic boundary.  However, sampling the entire suburb within
# 20 metres of every intersection would yield 7,049 sample locations.  Each
# sample location requires 4 images (front, left, right, rear) giving a total
# of 28,196 images required from the Google Street View API, at an approximate cost
# of $197 USD.  Therefore, a smaller region "Mount Eliza Sample" was extracted
# from OpenStreetMap using the "osmium extract -bbox" option as follows:

# osmium extract --bbox=145.094,-38.176,145.110,-38.162 australia-latest.osm.pbf -o Locality_Mount_Eliza_Sample.osm
# osmium extract --bbox=145.092,-38.178,145.112,-38.160 australia-latest.osm.pbf -o Locality_Mount_Eliza_Sample_margin.osm

# This sample region contains a mix of roads with and without bicycle lanes,
# a regional highway, and small no-through roads, and is intended to be
# representative of the region, without a high API cost.  It yields 1,113
# sample locations, requiring 4,452 images from the Google Street View API, at
# an approximate cost of $31 USD.

# We will sample the middle of each intersection, but we can also sample a
# "margin" around the intersection, at 10m intervals.
# E.g. if we set this to "20" then we will sample points at:
#    -20m, -10m, 0m, 10m, and 20m
# from the centre of the intersection, along the assumed bearing of the road
margin = 20

# Trained detection model name
trained_model_name   = 'faster_rcnn_V1_2000'

# Prefix that will be included as a suffix in the label map file and tfrecord train and test files
dataset_version = 'V1'

# Confidence threshold to apply with the model.  The model must be at least
# this confident of a detection for it to count
confidence_threshold = 0.55

## Code

In [2]:
# General imports
import os
import sys

from pathlib import Path

# Make sure local modules can be imported
module_path_root = os.path.abspath(os.pardir)
if module_path_root not in sys.path:
    sys.path.append(module_path_root)
    
# Import local modules
import osm_gsv_utils.osm_walker as osm_walker
import osm_gsv_utils.gsv_loader as gsv_loader
import tf2_utils.tf2_model_wrapper as tf2_model_wrapper

## Identify sample points

Load the OSM data, and then generate lists of sample points at margins of 0, +/- 10m, and +/- 20m from each intersection,
and report on how many samples are found for each sample setting, to get an idea of the impact of increasing/decreasing
the margin.

In [3]:
# Derive paths for configuration

# A version of the locality name with spaces replaced by underscores
locality_clean   = locality.replace(' ', '_')

# Name of the locality with the margin around intersections included
locality_margin  = '{0:s}_{1:d}m'.format(locality_clean, margin)

# Paths to both the main OpenStreetMap XML extract and a second extract that allows a wider margin
filename_main    = os.path.join(os.path.abspath(os.pardir), 'data_sources', 'Locality_' + locality_clean + '.osm')
filename_margin  = os.path.join(os.path.abspath(os.pardir), 'data_sources', 'Locality_' + locality_clean + '_margin.osm')

# Batch file where the list of points to sample from GSV is written
batch_filename   = os.path.join(module_path_root, 'batches', locality_margin + '.csv')

# Derived GSV download/cache directory
gsv_download_dir = os.path.join(os.path.abspath(os.pardir), 'data_sources', 'gsv')

# Filename containing API key for connecting to Google Street View
apikey_filename    = os.path.join(module_path_root, 'apikey.txt')

# Output directory where the detection log and any images with detection overlays will be written
output_directory = os.path.join(module_path_root, 'detections', locality_margin)

# Output CSV file with a log of all detections
detection_log_path = os.path.join(output_directory, 'detection_log.csv')

# Change directory to make sure the detection model dependencies are found
os.chdir(Path(module_path_root).parent.absolute())

In [4]:
# Load OSM data into memory
# The "main" file is the exact area we want to cover
# The "margin" file is a slightly larger extract to capture any intersections
# at the margin of the main file.  There may be roads JUST outside the area
# being surveyed that only just touch the survey area at an intersection.
# If these roads are clipped from the main file, then they wouldn't show up
# as roads that share a common node, and therefore the intersections would
# not be detected.
walker = osm_walker(filename_main, filename_margin, verbose=False)

# Generate sample lists with different margin settings, and report sample point count for each
sample_points_20 = walker.sample_all_way_intersections(-20, +20, 10, ordered=True, verbose=False)
sample_points_10 = walker.sample_all_way_intersections(-10, +10, 10, ordered=True, verbose=False)
sample_points_00 = walker.sample_all_way_intersections(  0,   0, 10, ordered=True, verbose=False)

# How many GSV locations would we be downloading for each of these options?
print('+/- 20m: ' + str(len(sample_points_20)))
print('+/- 10m: ' + str(len(sample_points_10)))
print('+/- 00m: ' + str(len(sample_points_00)))

  0%|          | 0/203 [00:00<?, ?it/s]

  0%|          | 0/379 [00:00<?, ?it/s]

+/- 20m: 1113
+/- 10m: 705
+/- 00m: 297


## Download/Cache GSV images

Download GSV images (if not already cached)

In [5]:
# Initialise interface to Google Street View
gsv = gsv_loader(apikey_filename, gsv_download_dir)

# Create a batch file (CSV) with the list of locations to download from Google
# limit=0 means unlimited, set to a small integer to test a few downloads
gsv.write_batch_file(batch_filename, sample_points_20, limit=0)

# Process the batch file (with progress bar) and report how many images were fetched vs. skipped
# Working from a batch file means we have a permanent record of what was loaded (in case we need to resume later)
# and helps us implement a progress bar via tdqm
gsv.process_batch_file(batch_filename, progress=True, verbose=False)

Backup E:\Release\minor_thesis\batches\Mount_Eliza_Sample_20m.csv to E:\Release\minor_thesis\batches\Mount_Eliza_Sample_20m20211012_013239.csv
Write E:\Release\minor_thesis\batches\Mount_Eliza_Sample_20m.csv


  0%|          | 0/1113 [00:00<?, ?it/s]

GSV Cache Hits:       4452 Misses:          0


## Apply Model to Batch File

In [6]:
# Initialise model
model_wrapper = tf2_model_wrapper(
    locality, 
    margin, 
    gsv_download_dir, 
    output_directory, 
    trained_model_name, 
    version_suffix = dataset_version
)

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Output directory for detections: E:\Release\minor_thesis\detections\Mount_Eliza_Sample_20m
Label Map Path: [TensorFlow\workspace\annotations\label_map_V1.pbtxt]
Latest Checkpoint: ckpt-0


In [7]:
# Run detections for entire batch
model_wrapper.process_batch_file(batch_filename, min_score=confidence_threshold, progress=True, verbose=False)

  0%|          | 0/1113 [00:00<?, ?it/s]

INFO:tensorflow:depth of additional conv before box predictor: 0
Instructions for updating:
Use ref() instead.


'E:\\Release\\minor_thesis\\detections\\Mount_Eliza_Sample_20m\\detection_log.csv'

## Load Detections

Load the detection log that was just created, correlate to the Open StreetMap data, and write geojson files to compare and draw the routes

In [8]:
# Correlate detections to OpenStreetMap data
walker.load_detection_log(detection_log_path)

In [9]:
walker.write_geojsons(locality_margin, output_directory, intersection_skip_limit=1, verbose=False)

Writing hit, feature count: 21
Writing to: E:\Release\minor_thesis\detections\Mount_Eliza_Sample_20m\hit.geojson
Writing tag, feature count: 2
Writing to: E:\Release\minor_thesis\detections\Mount_Eliza_Sample_20m\tag.geojson
Writing both, feature count: 2
Writing to: E:\Release\minor_thesis\detections\Mount_Eliza_Sample_20m\both.geojson
Writing either, feature count: 21
Writing to: E:\Release\minor_thesis\detections\Mount_Eliza_Sample_20m\either.geojson
Writing hit_only, feature count: 20
Writing to: E:\Release\minor_thesis\detections\Mount_Eliza_Sample_20m\hit_only.geojson
Writing tag_only, feature count: 0
Writing to: E:\Release\minor_thesis\detections\Mount_Eliza_Sample_20m\tag_only.geojson
hit     : Total distance   11817.05m
tag     : Total distance    2344.05m
both    : Total distance    2344.05m
either  : Total distance   11817.05m
hit_only: Total distance    9387.53m
tag_only: Total distance       0.00m
