# Assignment for week 03
Vector Intro, Geopandas, CRS, Projections

## Reading and Tutorials 
Inspired/following assignment from Dave Shean (https://github.com/UW-GDA/gda_course_2021/blob/master/modules/04_Vector1_Geopandas_CRS_Proj/)


## Overview
This week, we are going to cover several fundamental geospatial data concepts, including coordinate systems, projections/transformations, vector geometries (points, lines, polygons) and basic geometry operations (intersect, buffer, etc). We will also begin using the GeoPandas package (https://geopandas.readthedocs.io/en/latest/).

>“GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and descartes and matplotlib for plotting.”

We will revisit vector data and cover more advanced processing, analysis and visualization in a few weeks, after the Raster Intro module. 

## Reading and Tutorials
Please read the following material before class on Thursday (especially important if you have limited GIS experience), and come with questions on topics that are unclear, so we can discuss together. There is some overlap in content, but they offer different presentations of essential material, so hopefully one or more will work for you, and some repetition will help solidify.

* [Data Carpentry Introduction to Geospatial Concepts](https://datacarpentry.org/organization-geospatial/): 
    * Section 2: Introduction to Vector Data (~10 min)
    * Section 3: Coordinate Reference Systems (~15 min)
* [Vector Geohackweek tutorial](https://geohackweek.github.io/vector/): first 4 sections (~30-45 min)
    * Introduction
    * Geospatial Concepts
    * Encodings, Formats and Libraries
    * GeoPandas Introduction
    * *Note: If you prefer an instructor explaining, here is a recording of this tutorial by Emilio Mayorga:* https://www.youtube.com/watch?v=t3PMTnhl1eY&feature=youtu.be
* [Earth Lab Intermediate Earth Data Science Textbook](https://www.earthdatascience.org/courses/use-data-open-source-python/intro-vector-data-python/spatial-data-vector-shapefiles/intro-to-coordinate-reference-systems-python/)
    * Section 2, Chapter 2: all sections (~30-60 min)
        * Can review on website or download data for interactive exploration
        * Lesson 1. GIS in Python: Introduction to Vector Format Spatial Data - Points, Lines and Polygons
        * Lesson 2. GIS in Python: Intro to Coordinate Reference Systems in Python
        * Lesson 3. Geographic vs projected coordinate reference systems - GIS in Python
        * Lesson 4. Understand EPSG, WKT and Other CRS Definition Styles

## Other Resources

### Official documentation:
* GeoPandas: http://geopandas.org/index.html
* Shapely: https://shapely.readthedocs.io/en/stable/manual.html

### Other good resources:
* https://automating-gis-processes.github.io/site/lessons/L1/Intro-Python-GIS.html
* https://automating-gis-processes.github.io/site/lessons/L2/overview.html
* https://github.com/Automating-GIS-processes/CSC/blob/master/source/notebooks/L1/geometric-objects.ipynb
* http://darribas.org/gds15/content/labs/lab_03.html
* https://geohackweek.github.io/visualization/03-cartopy/
* https://www.w3.org/2015/spatial/wiki/Coordinate_Reference_Systems
* https://github.com/geopandas/geopandas/tree/master/examples





## Let's begin by looking at some glacier data from Tim's research

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import geopandas as gpd


Let's use geopandas to look at some geophysical instruments that Tim installed for one of his projects at Turner Glacier, in the St. Elias Range of Alaska.

In [None]:
inst = gpd.read_file('../instruments/Instruments.shp')
inst

In [None]:
inst.plot()

In [None]:
# Map only the seismic sites using tan squares
inst[ inst['inst_type']=='Seismic' ].plot(marker='s', color='tan')

The CRS is the coordinate reference frame for the geospatial data.  It is identified by an EPSG code.

In [None]:
inst.crs

In [None]:
# Save the CRS of the instrument Geodataframe
instrument_crs = inst.crs.to_epsg()

<div class="alert alert-block alert-warning">

### 1. Subsetting and rewriting data
It turns out that the "Landing For Camera" is at the wrong location and is therefore not a useful part of this dataset.
Make a new version of the file titled "Instruments_2109" that doesn't include this coordinate, and save the file to the directory "Instruments."
    
Also, shapefile is a legacy data format from the 1990s, and is obsolete. There are many limitations: http://switchfromshapefile.org/

Better options these days are Geopackage (GPKG) when spatial index is required, and simple GeoJSON for other cases. Both are supported by any respectable GIS (including QGIS, ArcGIS, etc)
    
For this exercise, let's use geopackage.  Save the new Instruments_2109 as a geopackage (Instruments_2109.gpkg). 
Read the geopandas documentation to learn about writing files.
    
</div>

## Working with polygons
The Randolph Glacier Inventory (RGI) is a global catalog of glacier outlines: https://www.glims.org/RGI/

Metadata for these outlines are here: https://www.glims.org/RGI/00_rgi60_TechnicalNote.pdf

Let's read in the RGI glacier outlines for southern Alaska.  This is a subset of the complete set of RGI Alaska Glaciers 
(those outside of northern Alaska or the Alaska Range) that was necessary because the complete archive was over 100 MB, too large for github.

In [None]:
glaciers = gpd.read_file('../01_rgi60_Alaska/rgi60_southern_alaska.shp')#01_rgi60_Alaska/01_rgi60_Alaska.shp')
glaciers

+ Within these data Slope is the mean slope of DEM cells found within the glacier.
+ Aspect is the orientation of the glacier surface.
+ Lmax is the glacier length in meters.
+ TermType describes whether the glacier is is land, marine, or lake-terminating.
+ Name is the name of the glacier.

In [None]:
glaciers.describe()

In [None]:
fig, ax = plt.subplots() # Create new figure (fig) and axes (ax) objects that we can use to plot into, as we have using matplotlib
glaciers.plot(ax=ax) # Note here that we are now plotting within the axes object we just created

In [None]:
# Plot the distribution of glacier lengths
fig, ax = plt.subplots()
ax.hist(glaciers['Lmax'], bins=np.arange(0, 4000, 100) ) # Make a histogram with specified bin boundaries

<div class="alert alert-block alert-warning">

### 2. Glacier widths
What is the distribution of mean glacier widths?
Is there such a thing as a typical width?  Here, I use "typical" to indicate that some distribution has a central tendency, like a mean or median.
You'll have to define a mean glacier width, which if you consider a glacier to be broadly rectangular, could be its area divided by its length.

</div>

<div class="alert alert-block alert-warning">

### 3. Comparing lengths of different glacier types

Create a plot that indicates whether glaciers that end in the ocean tend to be longer or larger than glaciers that end on land.
Is there evidence of a difference?
Use a boxplot (https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html) to make your comparison.
</div>

In [None]:
# Pull out just Turner Glacier from the entire "Glaciers of Alaska" dataset
selected = glaciers['Name'] == "Turner Glacier"
glaciers.loc[selected]

In [None]:
turner = glaciers.loc[selected] # Subset out just the Turner Glacier dataframe
turner_utm = turner.to_crs(epsg=instrument_crs) # Create a new geodataframe for Turner Glacier in the UTM format
turner_utm.plot()

<div class="alert alert-block alert-warning">

### 4. Map of instrumentation at Turner Glacier

Make a plot showing the different locations of instruments at Turner Glacier.  On the outline of Turner Glacier (projected in UTM),
plot the different types of instruments (e.g., Seismic, sIPR, GNSS) using different colors and symbols.
Use labels and a legend to identify each symbol style.  Label each instrument with its name.

For your "instruments," begin by reading in the new geopackage you created above, that doesn't have the "Landing for Camera."
    
When you plot, you'll have to be certain to specify which "axes object" to plot each geodataframe into (i.e., using ax= arguments within plot commands), so that all geodataframes are plotted on the same figure.
</div>

<div class="alert alert-block alert-warning">
With geopandas, you can return the area of an element, according to its "geometry" field using the attribute `.area`.  As in `glaciers.area`.
What is the area of the element in the geodataframe `turner` and that in the geodataframe `turner_utm`.  Explain any difference between these two areas.
What are the units of each area?  How do these areas compare with the "Area" field of the RGI catalog entry for Turner Glacier?
</div>

## Exploring Moscow
Last week, I got a geodatabase from one of the lead GIS analysts for the City of Moscow containing several important layers for the city and its planning.  Let's explore these layers.

In [None]:
import fiona
moscow = '../Moscow.gdb'
fiona.listlayers(moscow)

In [None]:
streets = gpd.read_file(moscow, layer='Centerlines')

In [None]:
streets.plot()

In [None]:
streets.crs

In [None]:
parcels = gpd.read_file(moscow, layer='Parcels')
parcels

In [None]:
parcels.columns

<div class="alert alert-block alert-warning">

### 5. Property sales
Within the parcels layer, the column PM_DEEDCDT represents the date when a property was last sold.
+ When was the last property sold, as of the date of this geodatabase?  
+ Create a plot that shows sales per year as a function of time
+ In what month are most sales carried out?
</div>

<div class="alert alert-block alert-warning">

### 6. Public park area
Within the Parks layer, there is a field that identifies the size of each park, in units of acres.

On a map of the Moscow streets, bounded by the city limits, plot the location of each park, with the size of the park symbol scaled by the size of the park in acres.  Make the plot appealing.

</div>