In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import folium
from folium.plugins import MarkerCluster, GroupedLayerControl

from data_utilities import *

# List of Services

0: Boulevard Mowing, Parks and Urban Foresty  
~~1: Dog Complaint, Animal Services~~  
2: Frozen Catch Basin, Street Maintenance  
~~3: Graffiti, Parks and Urban Foresty~~  
~~4: Graffiti, Street Maintenance~~  
~~5: Litter Container Complaint, Street Maintenance~~  
~~6: Missed Garbage Collection, Garbage & Recycling~~  
~~7: Missed Recycling Collection, Garbage & Recycling~~  
~~8: Mosquito Complaint, Insect Control~~  
9: Neighbourhood Liveability Complaint, By Law Enforcement  
10: Potholes, Street Maintenance  
11: Sanding, Street Maintenance  
12: Sewer Backup, Sewer & Drainage  
13: Sidewalk Repairs, Street Maintenance  
14: Snow Removal - Roads, Street Maintenance  
15: Snow Removal - Sidewalks, Street Maintenance  
~~16: Tree Pest Caterpillar Complaint, Insect Control~~  
17: Water Main Leak, Water  

# Spatial

1. Generally, the number of events looks like related to residential population density. Examples are William Whyte and University.
1. Not sure about boulevard mowing
1. Census data is not recent (2006), may not be able to create full projection on the map(e.g. neighbourhood boundaries and names may be changed from 2006 to 2023)
    - We do have more recent cendus data (2016 and 2021), but it's not on Winnipeg data portal.
1. The pandemic widens the wealth gap between rich and poor neighborhoods. The polarity remains unchanged with rich neighborhoods having more events.
    - Example: Bridgewater (new community) vs Fort Richmond (older community)

- Hypothesis: High poverty areas are more likely to have more events.  
    Verdict: Not quite true.
    1. In the ward of Mynarski, the number of events is proportional to the level of poverty.
    2. Downtown area is complicated
        - Broadway-Assiniboine and Central Park have one of the highest polulation density, probably due to apartment buildings instead of detached house in Mynarski.
        - In these areas, the **population density is positively correlated with the level of poverty**.
        - Total events wise, it doesn't have much. In another word, the number of events is not proportional to the level of poverty, or the population density.
        - This could be due to the fact that apartment building have private caretaking service, which is not included in the data. In contrast, most if not all detached houses in the city will use the city's service.
    3. The north part of the city, for example Mynarski, despite have mainly detached houses, have much higher population density along with poverty level.
- TODO: Find which service request is more likely to be reported in high poverty areas?

https://data.winnipeg.ca/Census/Map-of-Higher-Poverty-Areas/hty7-qszy  
The data is not in the form of neighbourhood names, but in Polygon.   
TODO: find a way to convert polygon to neighbourhood names.


TODO: The poverty map assumes that high population density implies poverty. Need further investigation.

## Neighborhoods

### Overall distribution among service requests

In [None]:
DF.groupby("Service Area and Request").size().sort_values().plot(kind="barh")

### Overall distribution among wards and neighbourhoods

In [None]:
DF.groupby("Ward").size().sort_values().plot(kind="barh")

In [None]:
DF.groupby("Neighbourhood").size().sort_values().plot(kind="barh", figsize=(9, 50))

Maybe these downtown neighbourhoods are closer to ~~waste station~~? Nope, currently there's only one waste landfill in Winnipeg.  
How about trashtrucks' depot?

### Overall Distribution among Requested services

In [None]:
DF.groupby("Service Area and Request").size().sort_values().plot(kind="barh")

## Population density

In [None]:
density_df = pd.read_csv("./assets/Census_2006_population_density.csv")
def plot_population_density():
    """
    """
    WINNIPEG = [49.88366050119829, -97.14581222292078]
    m = folium.Map(location=WINNIPEG, zoom_start=12)

    # the ward have " Ward" in the end of the Boundry Name, remove it beforehand
    wards = density_df[density_df["Boundary Type"] == "Ward"]
    neighbourhoods = density_df[density_df["Boundary Type"] == "Neighbourhood"]

    #TODO: link population density data
    # By ward
    ward_cp = folium.Choropleth(
        geo_data="assets/Electoral Ward.geojson",
        data=wards,
        columns=["Boundary Name", "Population Density 2006"],
        key_on="feature.properties.name",
        name="Ward"
    ).add_to(m)

    for i in ward_cp.geojson.data['features']:
        try:
            i['properties']['density'] = str(float(wards[wards["Boundary Name"] == i['properties']['name']]["Population Density 2006"])) # type: ignore
        except KeyError:
            i['properties']['density'] = "KeyError"
        except TypeError:
            i['properties']['density'] = "TypeError"
    folium.GeoJsonTooltip(['name', 'density']).add_to(ward_cp.geojson)

    # By neighbourhood
    hood_cp = folium.Choropleth(
        geo_data="assets/Neighbourhood.geojson",
        data=neighbourhoods,
        columns=["Boundary Name", "Population Density 2006"],
        key_on="feature.properties.name",
        name="Neighbourhood"
    ).add_to(m)

    for i in hood_cp.geojson.data['features']:
        # if i['properties']['name'] == "Prairie Pointe": # edge case: we don't have data for this neighbourhood, it doesn't exist in the 311 dataset
        #     pass
        try:
            i['properties']['density'] = str(float(neighbourhoods[neighbourhoods["Boundary Name"] == i['properties']['name']]["Population Density 2006"])) # type: ignore
        except KeyError:
            i['properties']['density'] = "KeyError"
        except TypeError:
            i['properties']['density'] = "TypeError"
    folium.GeoJsonTooltip(['name', 'density']).add_to(hood_cp.geojson)


    layer_control = folium.LayerControl().add_to(m)
    GroupedLayerControl( # example: https://github.com/chansooligans/folium/blob/81a04d3628b78b9538daadc3da81c9b1ee278692/examples/plugin-GroupedLayerControl.ipynb
        groups={"Division": [ward_cp, hood_cp]} # either ward or neighbourhood, radio button choose one
    ).add_to(m)

    return m

plot_population_density()

Unfortunately, the 2006 census data is way too outdated. For example, here are the new neighourhoods that are not included in the 2006 census data:

In [None]:
list(set(sorted(DF["Neighbourhood"].unique().tolist())) - set(sorted(density_df[density_df["Boundary Type"] == "Neighbourhood"]["Boundary Name"].unique().tolist())))

Also Wards name difference:

In [None]:
sorted(DF["Ward"].unique().tolist())

In [None]:
sorted(density_df[density_df["Boundary Type"] == "Ward"]["Boundary Name"].unique().tolist())

## Overall

### Overall distribution among service requests

In [None]:
DF.groupby("Service Area and Request").size().sort_values().plot(kind="barh")

### Overall distribution among wards and neighbourhoods

In [None]:
DF.groupby("Ward").size().sort_values().plot(kind="barh")

In [None]:
DF.groupby("Neighbourhood").size().sort_values().plot(kind="barh", figsize=(9, 50))

Maybe these downtown hoods are closer to ~~waste station~~? Nope, currently there's only one waste landfill in Winnipeg.  
How about trashtrucks' depot?

### Overall Distribution among Requested services

In [None]:
DF.groupby("Service Area and Request").size().sort_values().plot(kind="barh")

## Overall events map

In [None]:
def plot_events(services_filter: list = []):
    """
    Plot events on a map

    Args:
    service_filter: a list that either contains the index of the service or the name of the service. Empty list means all services.
    """
    WINNIPEG = [49.88366050119829, -97.14581222292078]
    m = folium.Map(location=WINNIPEG, zoom_start=12, nan_fill_color="white")

    data = pd.DataFrame()
    if len(services_filter) != 0:
        for i in range(len(services_filter)):
            if type(services_filter[i]) != str:
                services_filter[i] = services[services_filter[i]]
        for i in services_filter:
            data = DF[DF["Service Area and Request"].isin(services_filter)]
    else:
        services_filter = services
        data = DF
        
    # By neighbourhood
    hood_data = data.groupby("Neighbourhood").size()
    hood_cp = folium.Choropleth(
        geo_data="assets/Neighbourhood.geojson",
        data=hood_data,
        key_on="feature.properties.name",
        name="Neighbourhood",
        nan_fill_color="white"
    ).add_to(m)

    for i in hood_cp.geojson.data['features']:
        if i['properties']['name'] == "Prairie Pointe": # edge case: we don't have data for this neighbourhood, it doesn't exist in the 311 dataset
            pass
        try:
            i['properties']['events'] = str(hood_data[i['properties']['name']])
        except KeyError:
            i['properties']['events'] = "0"
    folium.GeoJsonTooltip(['name', 'events']).add_to(hood_cp.geojson)


    # By ward
    ward_data = data.groupby("Ward").size()
    ward_cp = folium.Choropleth(
        geo_data="assets/Electoral Ward.geojson",
        data=ward_data,
        key_on="feature.properties.name",
        name="Ward",
        nan_fill_color="white"
    ).add_to(m)

    for i in ward_cp.geojson.data['features']:
        try:
            i['properties']['events'] = str(ward_data[i['properties']['name']])
        except KeyError:
            i['properties']['events'] = "0"
    folium.GeoJsonTooltip(['name', 'events']).add_to(ward_cp.geojson)


    # By case
    def point_str_to_tuple(s: str) -> tuple:
        split = s.split()
        # return (float(split[1][1:]), float(split[2][:-1]))
        return (float(split[2][:-1]), float(split[1][1:]))


    cases = dict()
    for i in services_filter:
        cases[i] = MarkerCluster(name=i)

    for i in data.index:
        folium.Marker(
            location=point_str_to_tuple(data['Point'][i]), # type: ignore
            # The popup takes 2 mins to load on 8-core RYZEN 5700X
            popup=f"<b>Date: </b>{data['Date'][i]}<p><b>Service Area: </b>{data['Service Area'][i]}<p><b>Service Request: </b>{data['Service Request'][i]}",
        ).add_to(cases[data['Service Area and Request'][i]])

    for i in services_filter:
        cases[i].add_to(m)


    layer_control = folium.LayerControl().add_to(m)
    GroupedLayerControl( # example: https://github.com/chansooligans/folium/blob/81a04d3628b78b9538daadc3da81c9b1ee278692/examples/plugin-GroupedLayerControl.ipynb
        groups={"Division": [hood_cp, ward_cp],
                }, # either ward or neighbourhood, radio button choose one
                nan_fill_color="white"
    ).add_to(m)

    return m

In [None]:
plot_events([])

## Scoped by service type

### Graffiti

In [None]:
plot_events([3, 4])

https://data.winnipeg.ca/Census/Map-of-Higher-Poverty-Areas/hty7-qszy

Graffiti reports have something to do with population density, both residential and commercial.  

The downtown area has more graffiti reports despite have fewer overall events. People do graffiti to attract attention, so it makes more sense to do it in a place with more people.

Things to consider: so many people work here, so one graffiti will get reported multiple times.

### Garbage related

In [None]:
plot_events([5, 6, 7])

1. Industrial and non residential areas have near-zero garbage related events. They may have their own garbage collection service. Examples are University, Assiniboine park, Buffalo, The Forks, etc.
1. Older communities have more garbage related events. Examples are Mynarski vs Bridgewater, etc.
1. South Portage have an exceptionally high number of Litter Container complaints. This makes sense because it's where people work, but they don't live there, so people will get rid of their garbage in the public litter container.

### Insect Control

In [None]:
plot_events([8])

In [None]:
plot_events([16])

1. Mosquitos need water to reproduce. Neighbourhoods have a relatively high number of mosquito complains usually have a large area made of park or pond.
1. There are counterexamples, Linden Woods and Bridgewater have quite a large area of park, but their mosquito complains are virtually zero.
1. This may also reinforces the idea that older communities have a positive correlation with the number of events.
1. For tree pests, I can say for Fort Richmond. This community have a lot of trees.

### Dog

In [None]:
plot_events([1])

1. Are River Park South and Dakota Crossing old communities?
1. It seems that there is a positive correlation between the population density and the dog complains, also the age of the community. 


### Water related


In [None]:
plot_events([2])

In [None]:
plot_events([12])

In [None]:
plot_events([17])

## Top N Overview

### Top Neighbourhood from each ward

In [None]:
wards = DF["Ward"].dropna().unique().tolist()

for ward in wards:
    print(f"For ward {ward}, the top neighbourhood is:")
    print(DF.query(column_contains("Ward", [ward])).groupby("Neighbourhood").size().sort_values(ascending=False)[0:1])

### Top request from each ward

In [None]:
wards = DF["Ward"].dropna().unique().tolist()
n = 5

for ward in wards:
    print(f"For ward {ward}, the top {n} requests are:")
    print(DF.query(column_contains("Ward", [ward])).groupby("Service Area and Request").size().sort_values(ascending=False)[0:n])

### Top 10 Neighborhoods's request

In [None]:
DF.query(column_contains("Neighbourhood", top_n(10, "Neighbourhood"))).groupby("Service Area and Request").size().sort_values().plot(kind="barh")