# Lab 4

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/giswqs/geog-312/blob/main/book/labs/lab_04.ipynb)

This lab will help you solidify your understanding of working with `NumPy`, `Pandas`, and `GeoPandas` for geospatial data analysis. Through these exercises, you will perform data manipulation, spatial analysis, and visualizations by combining these powerful libraries.

## Exercise 1: NumPy Array Operations and Geospatial Coordinates

In this exercise, you will work with NumPy arrays representing geospatial coordinates (latitude and longitude) and perform basic array operations.

1. Create a 2D NumPy array containing the latitude and longitude of the following cities: Tokyo (35.6895, 139.6917), New York (40.7128, -74.0060), London (51.5074, -0.1278), and Paris (48.8566, 2.3522).
2. Convert the latitude and longitude values from degrees to radians using np.radians().
3. Calculate the element-wise difference between Tokyo and the other cities' latitude and longitude in radians.

In [12]:
import numpy as np

# 1
a2d = np.array(
    [[35.6895, 139.6917], [40.7128, -74.0060],
    [51.5074, -0.1278], [48.8566, 2.3522]]
)
print(f'Lat lon array: \n {a2d}')

# 2
a2dRad = np.radians(a2d)
print(f'Radians array: \n {a2dRad}')

# 3
tokyoDiff = a2dRad - a2dRad[0]
print(f'Tokyo element diff: \n {tokyoDiff}')

Lat lon array: 
 [[ 3.568950e+01  1.396917e+02]
 [ 4.071280e+01 -7.400600e+01]
 [ 5.150740e+01 -1.278000e-01]
 [ 4.885660e+01  2.352200e+00]]
Radians array: 
 [[ 6.22899283e-01  2.43808010e+00]
 [ 7.10572408e-01 -1.29164837e+00]
 [ 8.98973719e-01 -2.23053078e-03]
 [ 8.52708531e-01  4.10536347e-02]]
Tokyo element diff: 
 [[ 0.          0.        ]
 [ 0.08767312 -3.72972847]
 [ 0.27607444 -2.44031063]
 [ 0.22980925 -2.39702647]]


## Exercise 2: Pandas DataFrame Operations with Geospatial Data

In this exercise, you'll use Pandas to load and manipulate a dataset containing city population data, and then calculate and visualize statistics.

1. Load the world cities dataset from this URL using Pandas: https://github.com/opengeos/datasets/releases/download/world/world_cities.csv
2. Display the first 5 rows and check for missing values.
3. Filter the dataset to only include cities with a population greater than 1 million.
4. Group the cities by their country and calculate the total population for each country.
5. Sort the cities by population in descending order and display the top 10 cities.

In [1]:
import pandas as pd
import geopandas as gpd

url = "https://github.com/opengeos/datasets/releases/download/world/world_cities.csv"
df = pd.read_csv(url)
df.head(5)

df_filtered = df[df["population"] > 1000000]
df_grouped = df_filtered.groupby("country")["population"].sum()

print(f'Country and total population: \n {df_grouped}')

df_sorted = df_filtered.sort_values("population", ascending=False)
df_sorted.head(10)

Country and total population: 
 country
AFG     3277000
AGO     6272900
ARE     1379000
ARG    15450000
ARM     1102000
         ...   
VNM    11661000
YEM     2008000
ZAF    11738000
ZMB     1328000
ZWE     1572000
Name: population, Length: 110, dtype: int64


Unnamed: 0,id,name,country,latitude,longitude,population
1239,1240,Tokyo,JPN,35.68502,139.75141,35676000
1224,1225,New York,USA,40.74998,-73.98002,19040000
1230,1231,Mexico City,MEX,19.44244,-99.13099,19028000
1240,1241,Mumbai,IND,19.01699,72.85699,18978000
1245,1246,Sao Paulo,BRA,-23.55868,-46.62502,18845000
1148,1149,Delhi,IND,28.66999,77.23,15926000
1238,1239,Shanghai,CHN,31.21645,121.4365,14987000
1243,1244,Kolkata,IND,22.49497,88.32468,14787000
1175,1176,Dhaka,BGD,23.72306,90.40858,12797394
1217,1218,Buenos Aires,ARG,-34.6025,-58.39753,12795000


## Exercise 3: Creating and Manipulating GeoDataFrames with GeoPandas

This exercise focuses on creating and manipulating GeoDataFrames, performing spatial operations, and visualizing the data.

1. Load the New York City building dataset from the GeoJSON file using GeoPandas: https://github.com/opengeos/datasets/releases/download/places/nyc_buildings.geojson
2. Create a plot of the building footprints and color them based on the building height (use the `height_MS` column).
3. Create an interactive map of the building footprints and color them based on the building height (use the `height_MS` column).
4. Calculate the average building height (use the `height_MS` column).
5. Select buildings with a height greater than the average height.
6. Save the GeoDataFrame to a new GeoJSON file.

In [16]:
gdf = gpd.read_file("https://github.com/opengeos/datasets/releases/download/places/nyc_buildings.geojson")
avgHeight = gdf["height_MS"].mean()
avgGeom = gdf[gdf["height_MS"] > avgHeight]
gdf.head(5)

# Save avgGeom to a new GeoJSON file
output_file = "avg_heights.geojson"
avgGeom.to_file(output_file, driver="GeoJSON")
print(f'GeoDataFrame has been written to {output_file}')

# Interactive map of the buildin footprints
m = gdf.explore(
    "height_MS",
    cmap = "Blues",
    legend = True,
    # tooltip = "height_MS",
    name = "All Buildings"
)

avgGeom.explore(
    "height_MS",
    cmap = "Reds",
    legend = True,
    # tooltip = "height_MS",
    name = "Above Average",
    m = m # Overlaying on top of the map
) # I want the explore show the selected avgGeom on top of the original data

GeoDataFrame has been written to avg_heights.geojson


## Exercise 4: Combining NumPy, Pandas, and GeoPandas

This exercise requires you to combine the power of NumPy, Pandas, and GeoPandas to analyze and visualize spatial data.

1. Use Pandas to load the world cities dataset from this URL: https://github.com/opengeos/datasets/releases/download/world/world_cities.csv
2. Filter the dataset to include only cities with latitude values between -40 and 60 (i.e., cities located in the Northern Hemisphere or near the equator).
3. Create a GeoDataFrame from the filtered dataset by converting the latitude and longitude into geometries.
4. Reproject the GeoDataFrame to the Mercator projection (EPSG:3857).
5. Calculate the distance (in meters) between each city and the city of Paris.
6. Plot the cities on a world map, coloring the points by their distance from Paris.

In [36]:
url = "https://github.com/opengeos/datasets/releases/download/world/world_cities.csv"
df = pd.read_csv(url)

df_filtered = df.query("latitude > -40 & latitude < 60")
df_filtered.head(5)

# Create GeoDataFrame from filtered dataset
gdf = gpd.GeoDataFrame(
    df_filtered, 
    geometry=gpd.points_from_xy(df_filtered['longitude'], df_filtered['latitude']), 
    crs=4326
)

# Reproject the GeoDataFrame to Mercator Projection
gdf = gdf.to_crs(3857)

# Calculate distance (in meters) between each city and the city of Paris
gdf = gdf.set_index("name")
paris_point = gdf.loc["Paris", "geometry"]
gdf["distance_to_paris"] = gdf["geometry"].distance(paris_point)
# gdf

# Plot the cities on a world map, coloring the points by their distance from Paris
gdf.explore(
    "distance_to_paris",
    legend = True,
    tooltip = "distance_to_paris",
    name = "Distance to Paris"
)