# Preparing Vector Data for Analysis
### by [Kate Vavra-Musser](https://vavramusser.github.io) for the [R Spatial Notebook Series](https://vavramusser.github.io/r-spatial)

## Introduction
Spatial data plays a critical role in geographic information systems (GIS) and spatial analysis. However, real-world datasets often come with geometric issues such as invalid geometries, misaligned coordinates, or topology errors, which can cause errors in analysis and visualization workflows. Ensuring the validity of geometries and resolving alignment issues is a key step in preparing spatial data for analysis.

### Notebook Goals
In this notebook, we will focus the [*st_is_valid*](https://rdrr.io/cran/sf/man/valid.html), [*st_make_valid*](https://rdrr.io/cran/sf/man/valid.html), and [*st_snap*](https://rdrr.io/cran/sf/man/geos_binary_ops.html) functions from the [**sf**](https://cran.r-project.org/web/packages/sf/index.html) package. These functions help validate and repair geometries as well as align spatial features for accurate processing. By the end of this notebook, you will have a practical understanding of how to identify and fix common spatial data issues, enabling seamless integration of spatial datasets into your analysis.

### âœ¨ Prerequisites âœ¨
* Complete [Introduction to sf: Reading, Writing, and Inspecting Vector Data](https://platform.i-guide.io/notebooks/9968babe-22e4-4c3d-98e2-d8b45e9672cd)

### ðŸ’½ Data Used in this Notebook ðŸ’½
* United States State Boundaries Shapefile (*ipums_nhgis_states.shp*)
  * If you worked through [IPUMS NHGIS Data Extraction Using ipumsr: Supplemental Exercise 2](https://platform.i-guide.io/notebooks/bc79eda6-8353-42ea-8cb7-5db70aa6febf) you should have created and saved a copy of *ipums_nhgis_states.shp* in the final section of the notebook.
  * You can also download a copy of *ipums_nhgis_states.zip* file from [the I-GUIDE platform](https://platform.i-guide.io/datasets/1a5acd50-4741-447a-bf36-2331b39559af) or directly from [Kate's GitHub](https://github.com/vavramusser/r-spatial/blob/main/ipums_nhgis_states.zip).  You will need to unzip *ipums_nhgis_states.zip* and extract *ipums_nhgis_states.shp* file to your workspace.

### Notebook Overview
1. Setup
2. Validating and Repairing Geometries
3. Aligning Geometries

---

## 1. Setup
This notebook requires the following R packages and functions.

[**sf**](https://cran.r-project.org/web/packages/sf/index.html) Â· Support for [simple features](https://r-spatial.github.io/sf/articles/sf1.html), a standardized way to encode spatial vector data - Binds to [*GDAL*](https://gdal.org/en/stable) for reading and writing data, to [*GEOS*](https://libgeos.org) for geometrical operations, and to [*PROJ*](https://proj.org/en/stable) for projection conversions and datum transformations - Uses by default the [*s2*](https://cran.r-project.org/web/packages/s2/index.html) package for spherical geometry operations on ellipsoidal (long/lat) coordinates Â· This notebook uses the following functions from *sf*.

* [*geos_binary_ops*](https://rdrr.io/cran/sf/man/geos_binary_ops.html) Â· geometric operations on pairs of simple feature geometry sets
  * *st_snap* Â· snaps the vertices and segments of a geometry to another geometry's vertices
* [*valid*](https://rdrr.io/cran/sf/man/valid.html) Â· check validity or make an invalid geometry valid
  * *st_make_valid* Â· make an invalid geometry valid
  * *st_is_valid* Â· check validity

### 1a. Install and Load Required Packages
If you have not already installed the required packages, uncomment and run the code below:

In [None]:
# install.packages("sf")

Load the packages into your workspace.

In [None]:
library(sf)

In [None]:
# read the shapefile into an sf object
states <- st_read("ipums_nhgis_states.shp")

## 2. Validating and Repairing Geometries

The first step in working with spatial data is to ensure that geometries are valid. Invalid geometries can result from overlapping polygons, self-intersecting lines, or improper ring structures. The [*st_is_valid*](https://rdrr.io/cran/sf/man/valid.html) function checks for geometry validity, while [*st_make_valid*](https://rdrr.io/cran/sf/man/valid.html) attempts to fix invalid geometries.

### 2a. Checking for Invalid Geometries with [*st_is_valid*](https://rdrr.io/cran/sf/man/valid.html)

[*st_is_valid*](https://rdrr.io/cran/sf/man/valid.html) returns TRUE for valid geometries and FALSE for invalid ones.

In [None]:
#checking validity
table(st_is_valid(states))

### 2b. Validating Invalid Geometries with [*st_make_valid*](https://rdrr.io/cran/sf/man/valid.html)

[*st_make_valid*](https://rdrr.io/cran/sf/man/valid.html) repairs geometries using robust algorithms, ensuring compatibility with spatial operations.

In [None]:
# fixing invalid geometries
states_valiated <- st_make_valid(states)

In [None]:
table(st_is_valid(states_valiated))

## 3. Aligning Geometries

The [*st_snap*](https://rdrr.io/cran/sf/man/geos_binary_ops.html) function adjusts geometries by snapping vertices within a specified distance to a target geometry. This is particularly useful when aligning misaligned geometries or merging datasets with small offsets.  st_snap adjusts misaligned geometries within a specified tolerance distance.  It is ideal for cleaning small inconsistencies in spatial datasets.

In [None]:
# example: Aligning geometries using st_snap
target_line <- st_sfc(st_linestring(matrix(c(0, 0, 5, 5), ncol = 2, byrow = TRUE)))
misaligned_line <- st_sfc(st_linestring(matrix(c(0.2, 0, 5.2, 5.1), ncol = 2, byrow = TRUE)))

In [None]:
# visualizing original geometries
plot(target_line, col = "blue", lwd = 2, main = "Original and Misaligned Geometries")
plot(misaligned_line, col = "red", add = TRUE)

In [None]:
# snapping geometries
snapped_line <- st_snap(misaligned_line, target_line, tolerance = 0.3)

In [None]:
# visualizing snapped geometries
plot(target_line, col = "blue", lwd = 2, main = "Snapped Geometry")
plot(snapped_line, col = "green", add = TRUE)

---

## Next Steps

* Move on to Chapter 3: IPUMS Data Acquisition and Extraction
  * [**Chapter 3.01 IPUMS USA Data Extraction using ipumsr**]((https://platform.i-guide.io/notebooks/ab5cad39-6d00-43d2-bc51-17fd4e6b98f2)
  * [**Chapter 3.02 IPUMS NHGIS Data Extraction using ipumsr**](https://platform.i-guide.io/notebooks/be08e56e-1c08-458e-a230-263c64d386bc)
* Move on to Chapter 4: Open-Source GIS Data Acquisition and Extraction
  * [**Chapter 4.01 Importing Data from Natural Earth with rnaturalearth and rnaturalearthdata**]()
* Return to the [**R Spatial Notebooks Project Chapter List**](https://vavramusser.github.io/r-spatial/#:~:text=Chapter%201%3A%20Data%20Sources%20and%20APIs) to view a list of all available notebooks organized in the R Spatial Notebooks chapter structure.
* Visit the [**R Spatial Notebooks Project Homepage**](https://vavramusser.github.io/r-spatial) to learn more about the project, view the list of all notebooks, and explore additional resources.
* Join the project [**Mailing List**](https://mailchi.mp/ab01e8fc8397/r-spatial-email-signup) to hear about future notebook releases and other updates.
* If you have an idea for a new notebook please submit your idea via the [**Suggestion Box**](https://us19.list-manage.com/survey?u=746bf8d366d6fbc99c699e714&id=54590a28ea&attribution=false).

---

## â˜… Thank You â˜…

Thank you so much for engaging with this notebook and supporting the project!  The R Spatial Notebooks Project is a labor of love so if you enjoy or benefit from these notebooks, please consider [**Donating to the Project**](https://buymeacoffee.com/vavramusser).  Your support allows me to continue producing notebooks and supporting the R Spatial Notebooks community.