# Importing Data from [Natural Earth](https://www.naturalearthdata.com) with [rnaturalearth](https://cran.r-project.org/web/packages/rnaturalearth/index.html) and [rnaturalearthdata](https://cran.r-project.org/web/packages/rnaturalearthdata/index.html)
### by [Kate Vavra-Musser](https://vavramusser.github.io) for the [R Spatial Notebook Series](https://vavramusser.github.io/r-spatial)

## Introduction
[Natural Earth](https://www.naturalearthdata.com) is a public domain map dataset available at multiple scales, featuring a wide variety of cultural and physical geographic data. The [rnaturalearth](https://cran.r-project.org/web/packages/rnaturalearth/index.html) and [rnaturalearthdata](https://cran.r-project.org/web/packages/rnaturalearthdata/index.html) R packages simplify the process of accessing, downloading, and using these datasets directly within R. These datasets are particularly valuable for social sciences research involving spatial data analysis due to their compatibility with R's spatial data tools.

### Notebook Goals
In this notebook, we will explore how to use [rnaturalearth](https://cran.r-project.org/web/packages/rnaturalearth/index.html) and [rnaturalearthdata](https://cran.r-project.org/web/packages/rnaturalearthdata/index.html) to import data from Natural Earth. We will focus on reviewing available datasets, downloading them, and saving the data for further analysis. By the end of this notebook, you will have a workflow to efficiently access and visualize Natural Earth data, including cultural and physical features such as country and boundary polygons.

### ✨ Prerequisites ✨
* Complete [Introduciton to Natural Earth](https://platform.i-guide.io/notebooks/924c7ca6-3d12-4a80-ab4d-814cc80f7f79)
* Complete [Introduction to sf: Reading, Writing, and Inspecting Vector Data](https://platform.i-guide.io/notebooks/9968babe-22e4-4c3d-98e2-d8b45e9672cd)

### Notebook Overview
1. Setup
2. Explore Available Data from Natural Earth
3. Download and Save Data from Natural Earth

## 1. Setup
This notebook requires the following R packages and functions.

#### Required Packages

[**rnaturalearth**](https://cran.r-project.org/web/packages/rnaturalearth/index.html) · World Map Data from [Natural Earth](https://www.naturalearthdata.com) · Facilitates mapping by making natural earth map data from [Natural Earth](https://www.naturalearthdata.com) more easily available to R users · This notebook uses the folloing functions from *rnationalearth*.

* [*ne_countries*](https://rdrr.io/cran/rnaturalearth/man/ne_countries.html) · get natural earth world country polygons
* [*ne_download*](https://rdrr.io/cran/rnaturalearth/man/ne_download.html) · download data from Natural Earth and (optionally) read into R
* [*ne_find_vector_data*](https://rdrr.io/cran/rnaturalearth/man/ne_find_vector_data.html) · return a dataframe of available vector layers on Natural Earth

[**rnaturalearthdata**](https://cran.r-project.org/web/packages/rnaturalearthdata/index.html) · World Vector Map Data from [Natural Earth](https://www.naturalearthdata.com) Used in [rnaturalearth](https://cran.r-project.org/web/packages/rnaturalearth/index.html) · Vector map data from [Natural Earth](https://www.naturalearthdata.com) - Access functions are provided in the accompanying package [rnaturalearth](https://cran.r-project.org/web/packages/rnaturalearth/index.html)

[**sf**](https://cran.r-project.org/web/packages/sf/index.html) · Support for [simple features](https://r-spatial.github.io/sf/articles/sf1.html), a standardized way to encode spatial vector data - Binds to [*GDAL*](https://gdal.org/en/stable) for reading and writing data, to [*GEOS*](https://libgeos.org) for geometrical operations, and to [*PROJ*](https://proj.org/en/stable) for projection conversions and datum transformations - Uses by default the [*s2*](https://cran.r-project.org/web/packages/s2/index.html) package for spherical geometry operations on ellipsoidal (long/lat) coordinates · This notebook uses the following functions from *sf*.

* [*st_geometry*](https://rdrr.io/cran/sf/man/st_geometry.html) · get, set, replace or rename geometry from an sf object
* [*st_write*](https://rdrr.io/cran/sf/man/st_write.html) · write simple features object to file or database

### 1a. Install and Load Required Packages
If you have not already installed the required packages, uncomment and run the code below:

In [None]:
# install.packages("rnaturalearth", "rnaturalearthr", "sf")

Load the packages into your workspace.

In [None]:
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)

## 2. Explore Available Data from Natural Earth

Natural Earth provides datasets across three scales:

* Small scale (1:110m): Best for global or continental-scale analysis.
* Medium scale (1:50m): Useful for regional or country-level analysis.
* Large scale (1:10m): Provides the most detailed data, suitable for local-level analysis.

The [*ne_find_vector_data*](https://rdrr.io/cran/rnaturalearth/man/ne_find_vector_data.html) function helps you explore available data from [Natural Earth](https://www.naturalearthdata.com) based on scale and category.  The function *checks the Natural Earth Github repository for current vector layers and provides the file name required in the type argument of ne_download*. 

### 2a. Review the List of Cultural Data

Below, we review the available cultural data, which includes datasets such as administrative boundaries, populated places, and other human-related geographic features.   For this exercise, we will limit our search to only "cultural" data.

The purpose of this initial process is to explore what data is available and also identify the keywords for the layers we are interested in.  We will use the keywords in later functions to tell Natural Earth exactly what data we want to export.

In [None]:
# list of small scale cultural data (1:110 meters)
ne_find_vector_data(category = "cultural")

The function returns a list of 14 available cultural vector datasets.  Note that all the returned data is at the 1:110 (small) scale - this is the default scale for the function, but we can change the scale to 1:50 (medium) or 1:10 (large).  Below we slightly change our function call to search for only cultural vector data at scale 1:50 (medium).

In [None]:
# list of medium scale cultural data (1:50 meters)
ne_find_vector_data(category = "cultural", scale = 50)

Finally, let's also check out what cultural data is available at the 1:10 (large) scale.

In [None]:
# list of large scale cultural data (1:10 meters)
ne_find_vector_data(category = "cultural", scale = 10)

These commands display lists of available vector datasets, allowing you to browse the options and select datasets relevant to your analysis.  Once you've identified the datasets of interest, you can download them using functions in the next section.

## 3. Download and Save Data from Natural Earth

After identifying relevant datasets, you can download and visualize them using the [*ne_countries*](https://rdrr.io/cran/rnaturalearth/man/ne_countries.html) function for global country boundaries, [*ne_coastline*](https://rdrr.io/cran/rnaturalearth/man/ne_coastline.html) for global coastlines, and [*ne_download*](https://rdrr.io/cran/rnaturalearth/man/ne_download.html) for all other datasets.

### 3a. United States Boundary Polygons

As an example, we will download and visualize boundary polygons for the United States.  To do this, we will use the *ne_countries* function and specify that we only want the border of the United States of America.

In [None]:
# United States State Boundaries (polygons)
usa_boundary <- ne_countries(country = "United States of America")

Let's take a look at the structure of the *usa_boundary* object.

In [None]:
usa_boundary

The *usa_boundary* object is an sf object with one row (representing the single country in the data - the United States) and 169 feature columns.  Recall that we can work with sf objects similar to the way we work with dataframes in R but sf objects also have special properties specific to spatial data.

Let's quickly map the data using R's native *plot* function.  We will pass the geometry of the *usa_boundary* object to the *plot* function using the *st_geometry* function from the ***sf*** package.

In [None]:
# plot the boundaries
plot(st_geometry(usa_boundary))

We can see that our downloaded data is a simple border of the entire United States of America including Alaska and Hawaii.

Finally, let's save our data locally as a shapefile so we can load it into other workflows and notebooks.  Prior to saving the data we need to remove a couple of memory-intensive columns including the population estimate (*pop_est*) and Natural Earth identification code (*ne_id*) columns.  These columns were identified during the development process of this notebook as roadblocks to saving the *usa_boundary* file in the next step.  If you are working with your own data, or a different extraction from Natural Earth, your experience and process may be different.

In [None]:
# simplify the dataset by removing memory-intensive columns
usa_boundary <- usa_boundary[, !names(usa_boundary) %in% c("pop_est", "ne_id")]

We'll save the *usa_boundary* object as a shapefile using the *st_write* function from the ***sf*** package.

In [None]:
# write the usa_boundaries shapefile
st_write(usa_boundary, "usa_boundary.shp", driver = "ESRI Shapefile", delete_dsn = T)

This process retrieves boundary polygons for the United States, visualizes them, and cleans up the dataset by removing non-essential columns. This workflow can be extended to download and work with other datasets from Natural Earth.

### 3b. Global Airport Locations

As a second example, we'll download the airports dataset.  This data provides point locations and additional data for all airports worldwide.

In the previous example we used the *ne_countries* function which is specificly used for downloading country borders.  Similairly, the *ne_coastlines* function is used for downloading global coastlines.  For all other types of vector data from Natural Earth, we will need to use the flexible *ne_download* function and specify what type of data we are interested in.

For this exercise, we want to export the point locations of all airports globally.  We will need to use the proper layer key to pass to the *ne_download* function.  Referring back to the lists of cultural data we explored in part 2, we can see that airport data can be referenced using the "airports" key and that the airports data comes in either 1:50 (medium) or 1:10 (large) scales.

Setting up the function we will pass "airports" as the type while also specifying the category ("cultural") and scale ("10").  Since we want to make sure the function will return an *sf* object, we also set the return class to "sf".

In [None]:
airports <- ne_download(scale = 10, type = "airports", category = "cultural", returnclass = "sf")

The metadata printout shows us that the *airports* object is a simple feature (*sf*) collection of 893 point features, corresponding to 893 global airports, and 40 attribute fields.

Let's take a look at the first ten records in the *airports* object using the *head* function from base R.

In [None]:
head(airports)

We can see that the file contains lots of useful information on each airport such as the airport's name (in multiple languages), [IATA airport code](https://www.iata.org/en/publications/directories/code-search), location type (e.g. terminal, ramp, runway, and others), and a link to the airport's Wikipedia page.

Like we did with the United States boundary file, let's quickly plot the *airports* object by passing it to the base R *plot* funtion using the *st_geometry* function from the ***sf*** package.

In [None]:
plot(st_geometry(airports))

The simple plot gives us a high-level view of our data.  The airports are represented globally using point markers.

Finally, let's export the *airports* object to our local machine the same way we did with the *usa_boundary* object.  Again, before we do this step, we will remove a memory-intensive feature column which would impede the export process.

In [None]:
# simplify the dataset by removing memory-intensive columns
airports <- airports[, !names(airports) %in% c("ne_id")]

st_write(airports, "airports.shp", driver = "ESRI Shapefile", delete_dsn = T)

### 3c. Global Time Zones

For a final exercise, we will extract and download the global time zone boundaries from the Natural Earth repository.

Taking a look at the lists of available data we identified in part 2, we can see that *time_zones* data is only available at the 1:10 (large) scale.

In [None]:
time_zones <- ne_download(scale = 10, type = "time_zones", category = "cultural", returnclass = "sf")

head(time_zones)

Extracting the *time_zones* data gives us an *sf* object with 120 polygon features, representing 120 time zones, and 15 attribute columns.  Attributes include informaiton such as time zone name, the [Universal Coordinated Time (UTC) format](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) for the time zone, and a list of places in the time zone.

Let's take a look at a quick map of the *time_zones* data.

In [None]:
plot(st_geometry(time_zones))

Let's write the *time_zones* object as a shapefile to our local machine.

In [None]:
st_write(time_zones, "time_zones.shp", driver = "ESRI Shapefile", delete_dsn = T)

At the end of this script we have explored the available data in the Natural Earth repository and extracted three spatial datasets representing the boundary of the United States, global airport point locations, and global time zone boundaries.

---

## Next Steps

* Move on to Chapter 6: Mapping and Visualization
  * [**Chapter 6.1 Visualization and Quick Plots**](https://platform.i-guide.io/notebooks/dfe8fd72-f896-4dd2-9d61-6d9982394f1f)
  * [**Chapter 6.2 Mapping Point and Polygon Data**](https://platform.i-guide.io/notebooks/2b9f579c-32b0-4078-af39-994bb31d50ec)
  * [**Chapter 6.3 Choropleth Mapping**](https://platform.i-guide.io/notebooks/f2f973df-2412-49f0-ad39-d80051f20d4d)
* Return to the [**R Spatial Notebooks Project Chapter List**](https://vavramusser.github.io/r-spatial/#:~:text=R%20Spatial%20Notebooks%20Chapter%20List) to view a list of all available notebooks organized in the R Spatial Notebooks chapter structure.
* Visit the [**R Spatial Notebooks Project Homepage**](https://vavramusser.github.io/r-spatial) to learn more about the project, view the list of all notebooks, and explore additional resources.
* Join the project [**Mailing List**](https://mailchi.mp/ab01e8fc8397/r-spatial-email-signup) to hear about future notebook releases and other updates.
* If you have an idea for a new notebook please submit your idea via the [**Suggestion Box**](https://us19.list-manage.com/survey?u=746bf8d366d6fbc99c699e714&id=54590a28ea&attribution=false).

---

## ★ Thank You ★

Thank you so much for engaging with this notebook and supporting the project!  The R Spatial Notebooks Project is a labor of love so if you enjoy or benefit from these notebooks, please consider [**Donating to the Project**](https://buymeacoffee.com/vavramusser).  Your support allows me to continue producing notebooks and supporting the R Spatial Notebooks community.