# Discussion 6: Arctic regions geospatial wrangling

 ## Data loading and exploration
1. Download the data arctic_communities.geojson from Google drive and add it to your data/ directory as appropriate.

2. Read in the data into a variable named df and examine it with your team.

In [102]:
# Import necessary libraries
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd 

In [103]:
# Read in data 
df = gpd.read_file("data/arctic_communities.geojson")
df.head()

Unnamed: 0,admin,country,n_communities,geometry
0,United States of America,US,115,"MULTIPOLYGON (((-132.74687 56.52568, -132.7576..."
1,United Kingdom,GB,96,"MULTIPOLYGON (((-2.66768 51.62300, -2.74214 51..."
2,Sweden,SE,133,"MULTIPOLYGON (((19.07646 57.83594, 18.99375 57..."
3,Russia,RU,774,"MULTIPOLYGON (((145.88154 43.45952, 145.89561 ..."
4,Norway,NO,48,"MULTIPOLYGON (((20.62217 69.03687, 20.49199 69..."


**Individually, write down high-level steps on how you would explore and wrangle the data to produce the updated map. Do not code anything yet.** 
- Remove continental US from Alaska
- Reproject
- Plot

**What are some potential challenges?** 
- CRS must match 
- Splitting geom types


## 2. Check geometry types
1. Run df.geom_type. Write a brief explanation about the output in a markdown cell.

2. Create an if-else statement that:

- prints “Multiple feature types:” followed by the unique geometry types (no repetition) in the geodataframe if not all the features are polygons, and

- prints “All features are:” followed by the unique geometry type if all the features in the geodataframe have the same geometry type.

3. Wrap up your code into a function named check_polygons that receives a single geodataframe as its parameter and prints out a message about the geometry types in the geodataframe.

In [104]:
df = df.set_index("admin")
df.geom_type

admin
United States of America    MultiPolygon
United Kingdom              MultiPolygon
Sweden                      MultiPolygon
Russia                      MultiPolygon
Norway                      MultiPolygon
Lithuania                   MultiPolygon
Latvia                           Polygon
Iceland                          Polygon
Finland                     MultiPolygon
Estonia                     MultiPolygon
Greenland                   MultiPolygon
Faroe Islands               MultiPolygon
Denmark                     MultiPolygon
Canada                      MultiPolygon
Belarus                          Polygon
dtype: object

Every row in ```df``` seems to be some "polygon" type, where a "multipolygon" indicates a group of polygons joined together. The multipolygons are islands.

In [105]:
# if-else statement goes here..
def check_geom_type(df):
    if df.geom_type.unique().size > 1:
        print(f"Multiple feature types: {df.geom_type.unique()}")
    else: 
        print(f"“All features are: {df.geom_type.unique()}")


In [106]:
check_geom_type(df)

Multiple feature types: ['MultiPolygon' 'Polygon']


## 3. Explode polygons
- Overwrite the df geodataframe with the output from the explode method with the index_parts parameter set to False. Read the documentation for the method and use a markdown cell to write a brief explanation of what is being done.

- Reset the index of df.

- Use your check_polygons function to verify that df only has features of type polygon.

Don’t forget to write informative commits in the imperative every time you finish a major step.

In [107]:
df = df.explode(index_parts = False).reset_index()

In [108]:
check_geom_type(df)

“All features are: ['Polygon']


## 4. Compute minimum y-coordinate for polygons
At this point, every row in your df should be a single polygon.

- Select the first row of df using iloc. What kind of Python object is this?

- Select the geometry of the first row of df. What kind of Python object is this?

- Use the bounds attribute for shapely Polygons to select the southern-most bound of the first polygon in df.

- Create a function min_y that receives a single row of a geodataframe as its parameter and returns the minimum y-coordinate of its bounding box.

- Use the min_y function and the apply method for data frames to create a new column miny in df which has the minimum y coordinate.



In [177]:
# Check data type
type(df.iloc[0])
df.head()

Unnamed: 0,admin,country,n_communities,geometry
0,United States of America,US,115,"POLYGON ((-132.74687 56.52568, -132.75762 56.5..."
1,United States of America,US,115,"POLYGON ((-132.77988 56.24727, -132.83096 56.2..."
2,United States of America,US,115,"POLYGON ((-134.31274 58.22891, -134.31987 58.2..."
3,United States of America,US,115,"POLYGON ((-145.11851 60.33711, -145.15049 60.3..."
4,United States of America,US,115,"POLYGON ((-144.56563 59.81841, -144.61357 59.8..."


In [110]:
type(df.geometry.iloc[0])

shapely.geometry.polygon.Polygon

In [111]:
# Return bounds for first row geometry type
df.geometry.iloc[0].bounds[1]

56.511035156249996

In [228]:
import shapely as shapely

def min_y2(df): 
    n = df.index.to_list()
    for i in n:
        print(df.geometry.iloc[i].bounds[1])

In [224]:
#df["miny"] = min_y(df)
def min_y(i): 
    y = i.geometry.bounds[1]
    return y
min_y(df.iloc[0])

56.511035156249996

In [226]:
df["miny"] = df.apply(func = min_y, axis = 1)

In [227]:
df["miny"]

0      56.511035
1      56.244141
2      58.204102
3      60.312646
4      59.812646
         ...    
476    67.878809
477    66.857812
478    67.987598
479    69.539307
480    51.265039
Name: miny, Length: 481, dtype: float64