<!-- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NLTGit/OpenNightLights-colab-mirror/blob/master/onl/tutorials/mod5_4_comparing_cities.ipynb) -->
update colab:

# Intro to World Settlement Footprint data

Identifying areas of human settlement is a large area of focus in Earth Observation and many other disciplines.

The World Settlement Footprint (WSF) dataset is a recent powerful tool for understanding areas of settlement

For visualization we'll look at just the Red, Green and Blue channels and clip around Nepal. It was derived from multitemporal Sentinel-1 radar and Landsat-8 optical imagery and presents a binary classification of a pixel as representing a settlement (value=255) or not (value=0). The resolution is 0.32 arc seconds or approximately 10 meters!

More information (and the data) are <a href="https://figshare.com/articles/dataset/World_Settlement_Footprint_WSF_2015/10048412">available here</a> and by reading the original paper. {cite}`marconcini2020outlining`

As noted earlier, this settlement data is representative of the conditions in 2015. So we will use this high-resolution special database to train our classifier, but will need to rely on our other data sources to see change over time after 2015.

#### Note that for this exercise, you'll need to install a new Python package to your environment: xarray_leaflet
(you can do that by uncommenting the following line. You only need to do this once).

In [1]:
# !pip install xarray_leaflet

## Download WSF data

As of the creation of this tutorial, there does not appear to be any API access for these data. The data files are organized by tiles covering the globe, but you have to download everything as a single zip file. It is a 2.5 GB file, so it will take a few minutes.

Download this file and remember where you save it on your computer.

Fortunately, within this data folder, the data (stored as GeoTIFs) are organized by tiles that map to particular regions. 

Our simplest approach will be to identify the files that overlap Nepal and import those into our workspace.

#### We'll initialize Geemap and grab the bounding box for Nepal

In [6]:
import geemap, ee
from pathlib import Path

try:
        ee.Initialize()
except Exception as e:
        ee.Authenticate()
        ee.Initialize()

# get our Nepal boundary
# aoi = ee.FeatureCollection("FAO/GAUL/2015/level0").filter(ee.Filter.eq('ADM0_NAME','Nepal'))

# print(f"the bounds of Nepal are: \n{aoi.geometry().bounds().getInfo()}")

Navigate to the folder you just downloaded. Unzip (or extract) the files and within that folder you should find this file:
- `WSF2015_v1_EPSG4326_e080_n30_e090_n20.tif` 

and bc just a wee bit is North of the 30th parallel, we'll need this too:
- `WSF2015_v1_EPSG4326_e080_n40_e090_n30.tif`

In [4]:
wsf = Path("files","WSF2015_v1_EPSG4326_e080_n30_e090_n20.tif")
wsf

NameError: name 'Path' is not defined

In [2]:
import rasterio

In [3]:
with rasterio.open(wsf) as f:
    f.data()

NameError: name 'wsf' is not defined

We can see a real color image of Nepal. We reduced our timeframe to the median of 2019 and it appears we've captured some clouds. We will make a cloud mask to clear the image up using Sentinel-2's QA band. We're modeling this (in Python) from the example used in GEE: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2#bands

## References:
```{bibliography} ../references.bib
:filter: docname in docnames
```