### About Vector Data
Vector data are composed of discrete geometric locations (x, y values) known as vertices that define the “shape” of the spatial object. The organization of the vertices determines the type of vector that you are working with. There are three types of vector data:

* Points: Each individual point is defined by a single x, y coordinate. There can be many points in a vector point file. Examples of point data include: sampling locations, the location of individual trees or the location of plots.

* Lines: Lines are composed of many (at least 2) vertices, or points, that are connected. For instance, a road or a stream may be represented by a line. This line is composed of a series of segments, each “bend” in the road or stream represents a vertex that has defined x, y location.

* Polygons: A polygon consists of 3 or more vertices that are connected and “closed”. Thus the outlines of plot boundaries, lakes, oceans, and states or countries are often represented by polygons. Occasionally, a polygon can have a hole in the middle of it (like a doughnut), this is something to be aware of but not an issue you will deal with in this tutorial.

<img src="images/points-lines-polygons-vector-data-types.png" alt="drawing" width="600"/>

In [None]:
import geopandas as gpd
from geopandas.tools import sjoin
import matplotlib.pyplot as plt
import contextily as cx

In [None]:
# Load GEDI footprint points into Geopandas dataframe

GEDI_df = gpd.read_file("data/GEDI_Shots/GEDI.shp")

In [None]:
GEDI_df

In [None]:
# Create new column with rh100 in meters
GEDI_df["rh100m"] = GEDI_df["rh100"] * 0.01

In [None]:
GEDI_df

In [None]:
# Check CRS
GEDI_df.crs

In [None]:
# Load regional ecosystem polygons into geopandas dataframe
RE_df = gpd.read_file("data/RE_Polygons/RE_Polygons.shp")

In [None]:
RE_df

In [None]:
# Check CRS
RE_df.crs

### We can now map the geospatial data using matplotlib

In [None]:
# Initialise fig and ax
fig, ax = plt.subplots()

# plot the data using geopandas .plot() method
GEDI_df.plot(ax=ax)
plt.show()

In [None]:
# Initialise fig, ax and set figure size 
fig, ax = plt.subplots(figsize = (15,15))

# Plot the data adding a colourmap and legend
GEDI_df.plot(column='rh100m', 
                         legend=True, 
                         figsize=(10,6),
                         markersize=45, 
                         cmap="viridis_r", ax=ax);

In [None]:
# Initialise fig, ax and set figure size 
fig, ax = plt.subplots(figsize = (15,15))

# Plot the data adding a colourmap and legend
GEDI_df.plot(column='rh100m', 
                         legend=True, 
                         figsize=(10,6),
                         markersize=45, 
                         cmap="viridis_r", ax=ax);

# Add a basemap to the plot using contextily
cx.add_basemap(ax, crs=GEDI_df.crs, source = 'https://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}')

### Now we can apply similar code to plot our regional ecosystem polygons. 

In [None]:
# Initialise fig and ax, setting the figure size. 
fig, ax = plt.subplots(figsize = (15,15))

# Plot the data
RE_df.plot(column='RE',  
             figsize=(10,6),
             markersize=45, 
             cmap="Set2", ax=ax);

# Here we obtain the coordintes required to add labels to each polygon and add them to the dataframe. 
RE_df['coords'] = RE_df['geometry'].apply(lambda x: x.representative_point().coords[:])
RE_df['coords'] = [coords[0] for coords in RE_df['coords']]

# Loop through each polygon adding a label to the plot. 
for idx, row in RE_df.iterrows():
    ax.text(row.coords[0], 
            row.coords[1],
            s=row["RE"], 
            horizontalalignment='center', 
            bbox={'facecolor': 'white',
                  'alpha':0.8, 'pad': 2, 
                  'edgecolor':'none'})

# Add base map
cx.add_basemap(ax, crs=RE_df.crs, source = 'https://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}')

### Now we have visualised our regions we can calculate some summary stats for each polygon using the GEDI height data.

In [None]:
# Use a spatial join to join the 2 dataframes
points_polys = gpd.sjoin(GEDI_df, RE_df, how="left")

In [None]:
points_polys

In [None]:
# Group each shot by regional ecosystem classification and calculate stats
stats_pt = points_polys.groupby('RE')['rh100'].agg(['mean','std','max','min'])
stats_pt

In [None]:
# Quickly plot the data
stats_pt.plot.bar(figsize = (20,5))