# Distance Calculations

## Introduction

Proximity analysis is a critical aspect of spatial analysis, enabling researchers to evaluate distances between spatial features, assess accessibility, and identify relationships between locations.  Proximity analysis is widely used in accessibility studies, urban planning, environmental research, and many other fields.

This notebook demonstrates how to perform proximity analysis using R, covering common scenarios and applications.

## 1. Setup

Before running this script, you will need to install and load the following packages into your R environment:

In [2]:
#library(geosphere)
library(leaflet)
library(rnaturalearth)
#library(nngeo)
library(raster)
library(sf)
library(sp)
#library(tidyverse)

If you are working in the I-GUIDE environment, these packages should be already be installed.  However you will still need to load the packages into your workspace using *library* base R function.

## 2. Load and Explore the Data

For this notebook, we’ll use the [*rnaturalearth*](https://cran.r-project.org/web/packages/rnaturalearth/index.html) package to access and load datasets from [*Natural Earth*](https://www.naturalearthdata.com). These packages provide direct access to Natural Earth’s geographic data without requiring an API key, making it simple to bring boundaries and point data directly into R.

We’ll import the following data files:

* Populated Places (points): locations of major cities and towns
* Coastline (polygons): ocean coastline

Each dataset will be returned as an sf object, allowing us to work easily with the files in R using the [*sf*](https://cran.r-project.org/web/packages/sf/index.html) package.

In [16]:
# populated places point locations
cities <- ne_download(scale = "medium", type = "populated_places", category = "cultural", returnclass = "sf")

# coastline
coastline <- ne_download(scale = "medium", type = "coastline", category = "physical", returnclass = "sf")

Reading layer `ne_50m_populated_places' from data source 
  `/tmp/RtmpuGQa9N/ne_50m_populated_places.shp' using driver `ESRI Shapefile'
Simple feature collection with 1251 features and 137 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -175.2206 ymin: -90 xmax: 179.2166 ymax: 78.22097
Geodetic CRS:  WGS 84
Reading layer `ne_50m_coastline' from data source 
  `/tmp/RtmpuGQa9N/ne_50m_coastline.shp' using driver `ESRI Shapefile'
Simple feature collection with 1428 features and 3 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: -180 ymin: -85.19219 xmax: 180 ymax: 83.59961
Geodetic CRS:  WGS 84


### Visualize the Data

To better understand our datasets, let's visualize them using an interactive map.

In [17]:
leaflet() %>%
    addTiles() %>%
    addCircleMarkers(data = cities, color = "blue", radius = 3, label = ~NAME, group = "Cities") %>%
    addPolylines(data = coastline, color = "green", group = "Coastline") %>%
    addLayersControl(overlayGroups = c("Cities", "Coastline"))

For this notebook, we will analyze only the subset of populated places which are located in the United States - i.e. the list of major cities in the United States.

In [71]:
# Ensure both layers have the same CRS
cities <- st_transform(cities, crs = st_crs(united_states))

# Spatial join to select cities within the United States
cities_within_us <- cities[cities$SOV0NAME == "United States" & cities$POP_MAX >= 1000000,]

# Display the selected cities
cities_within_us[order(cities_within_us$NAME), c("NAME", "POP_MAX")]

# Visualize the result
leaflet() %>%
  addTiles() %>%
  addCircleMarkers(data = cities_within_us, color = "blue", radius = 3, label = ~NAME, group = "Cities in US")

Unnamed: 0_level_0,NAME,POP_MAX,geometry
Unnamed: 0_level_1,<chr>,<dbl>,<POINT [°]>
1188,Atlanta,4506000,POINT (-84.36764 33.73946)
190,Austin,1161000,POINT (-97.74472 30.2689)
207,Baltimore,2255000,POINT (-76.61458 39.28152)
1077,Boston,4467000,POINT (-71.07196 42.33191)
125,Bridgeport,1018000,POINT (-73.20191 41.18192)
759,Buffalo,1016000,POINT (-78.88195 42.88192)
1189,Chicago,8990000,POINT (-87.63524 41.84796)
754,Cincinnati,1636000,POINT (-84.45887 39.16383)
753,Cleveland,1890000,POINT (-81.69694 41.47193)
1075,Dallas,4798000,POINT (-96.79469 32.77196)


## 3. Basic Distance Calculations

### 3a. Euclidean Distance

Euclidean distance is the "straight-line" distance between two points in Cartesian space. It is calculated based on projected coordinates and is often used for smaller areas.

In [73]:
# Transform to projected CRS for accurate Euclidean distance
cities_projected <- st_transform(cities_within_us, crs = 26915)  # UTM Zone 15N

# Select a sample of cities for analysis
cities_projected <- cities_projected[1:10, ]

# Distance matrix (Euclidean)
dist_matrix_euclidean <- st_distance(cities_projected)

# Display the distance matrix (in meters)
dist_matrix_euclidean

Units: [m]
           1         2         3         4         5       6         7
1        0.0 3990880.7 2143965.8 1906222.1 1938217.2 3509676 2873899.7
2  3990880.7       0.0 1852034.0 2355362.5 2538266.8 1629116 1128469.7
3  2143965.8 1852034.0       0.0  748866.4 1022606.2 1701134  729934.6
4  1906222.1 2355362.5  748866.4       0.0  277198.2 1604189 1269217.5
5  1938217.2 2538266.8 1022606.2  277198.2       0.0 1604661 1489453.8
6  3509675.6 1629115.9 1701133.6 1604189.3 1604660.9       0 1334286.8
7  2873899.7 1128469.7  729934.6 1269217.5 1489453.8 1334287       0.0
8  3706227.0  369366.3 1562771.5 2010938.3 2182826.9 1298245  833541.7
9   586505.8 4254259.1 2455646.3 2362961.1 2437791.9 3958769 3172545.7
10  663526.0 4164882.0 2386132.4 2339447.9 2431273.2 3925103 3095378.1
           8         9        10
1  3706227.0  586505.8  663526.0
2   369366.3 4254259.1 4164882.0
3  1562771.5 2455646.3 2386132.4
4  2010938.3 2362961.1 2339447.9
5  2182826.9 2437791.9 2431273.2
6  1298245

### 3b. Geodesic Distance

Geodesic distance is the shortest path between two points on the Earth's surface. It is more accurate for larger areas or when working with lat/lon coordinates.

In [20]:
### requires the geosphere package ###

# Calculate geodesic distances using geosphere package
#coords <- st_coordinates(cities)
#dist_matrix_geodesic <- distm(
#  x = coords[1:10, ],
#  fun = distHaversine  # Haversine formula for geodesic distance
#)

# Display the distance matrix (in meters)
#dist_matrix_geodesic

### **3c. Distance from Cities to Coastline**
To calculate the distance from each city to the nearest point on the coastline.

In [81]:
# Calculate distance from each city to the nearest coastline point
city_to_coast_distances <- st_distance(cities_within_us, coastline)

# Find the minimum distance for each city
min_distances <- apply(city_to_coast_distances, 1, min)

# Convert distances to kilometers
min_distances_km <- round(min_distances / 1000, 1)

# Create a data frame with city names and nearest coastline distances
city_distance_to_coastline <- data.frame(
  city_name = cities_within_us$NAME,
  state = cities_within_us$ADM1NAME,
  distance_to_nearest_coastline_km = min_distances_km
)

# Display the results
city_distance_to_coastline[order(city_distance_to_coastline$distance_to_nearest_coastline_km),]

Unnamed: 0_level_0,city_name,state,distance_to_nearest_coastline_km
Unnamed: 0_level_1,<chr>,<chr>,<dbl>
22,San Juan,,0.5
33,San Francisco,California,0.7
2,Bridgeport,Connecticut,1.3
41,New York,New York,1.9
29,Boston,Massachusetts,2.1
40,"Washington, D.C.",District of Columbia,2.1
36,Miami,Florida,2.6
8,Baltimore,Maryland,4.0
24,Seattle,Washington,4.0
31,Philadelphia,Pennsylvania,4.1


Based this analysis, San Juan, Puerto Rico is the closest major city to the coast while Denver, Colorado is the furthest major city from the coast.

## 4. Nearest Neighbor Analysis

Nearest neighbor analysis identifies the closest spatial feature for each point in a dataset. This is useful for applications like finding the nearest service center or analyzing spatial clustering.

In [25]:
# Find the index of the nearest coastline feature for each city
nearest_indices <- st_nearest_feature(cities, coastline)

# Extract the corresponding distances
nearest_distances <- st_distance(cities, coastline[nearest_indices, ])

# Convert distances to kilometers
nearest_distances_km <- as.numeric(nearest_distances) / 1000

# Combine results into a data frame
nearest_results <- data.frame(
  city_name = cities$NAME,
  nearest_coastline_index = nearest_indices,
  distance_to_coastline_km = nearest_distances_km
)

# Display nearest neighbor results
nearest_results

city_name,nearest_coastline_index,distance_to_coastline_km
<chr>,<int>,<dbl>
Bombo,1388,928.935212
Fort Portal,1388,1158.145437
Potenza,1388,67.972990
Campobasso,1388,56.306284
Aosta,1388,182.652477
Mariehamn,950,59.005293
Ramallah,1388,46.674693
Vatican City,1388,21.892290
Poitier,1388,117.319061
Clermont-Ferrand,1388,254.998028


## 5. Buffer Analysis

Buffer analysis creates zones of influence around spatial features, which can be used to analyze proximity impacts, such as areas within a certain distance of cities.

# Create buffer zones around cities
buffers <- st_buffer(cities_projected[1:10, ], dist = 50000)  # 50 km buffer

# Check which coastlines intersect with buffers
intersections <- st_intersects(buffers, coastline)

# Summarize results
buffer_results <- data.frame(
  city_name = cities$NAME[1:10],
  num_coastline_intersections = sapply(intersections, length)
)

# Display buffer analysis results
buffer_results

# Visualize buffers on a map
leaflet() %>%
  addTiles() %>%
  addPolygons(data = st_transform(buffers, crs = 4326), color = "green", fillOpacity = 0.2, group = "Buffers") %>%
  addCircleMarkers(data = cities, color = "blue", radius = 5, group = "Cities") %>%
  addPolylines(data = coastline, color = "red", group = "Coastline") %>%
  addLayersControl(overlayGroups = c("Buffers", "Cities", "Coastline"))