# Assignment 03

## 311 Cases in San Francisco

For this assignment, I chose to look at two call types that relate to homelessness in San Francisco's open source 311 dataset. Recently there has been lots of news coverage about an increaste in human waste on the streets of San Francisco. The New York Times released an [article](https://www.nytimes.com/2018/10/08/us/san-francisco-dirtiest-street-london-breed.html) on the issue in October, 2018.

***NOTE***: I was unable to use the full dataset because it would only load with pandas sometimes. I am not sure why this is. I tried using chunksize to iterate through smaller dataframes in order to do operations in memory, but it wasn't working. 

In [None]:
import matplotlib.cm as cm
import matplotlib.font_manager as fm
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd


%matplotlib notebook 

In [None]:
# read our data set -- all 311 calls in available for San Francisco 
# this dataset is M A S S I V E (bigger than 1gb)
# my computer is T I N Y and W E A K

chunksize = 10 ** 6
fields = ['Opened', 'Request Type']
# iterator for our chunks
tp = pd.read_csv('311_Cases.csv', usecols=fields, index_col="Opened", parse_dates=True,
                 dtype={'Request Type': str})

In [None]:
chunk_list = []  # append each chunk df here 

# Each chunk is in df format
for df in tp:  
    # perform data filtering 
    chunk_filter = df[(df['Request Type'] == 'Human Waste')]
    
    # Once the data filtering is done, append the chunk to list
    chunk_list.append(chunk_filter)
    
# concat the list into dataframe 
# req_types = pd.concat(chunk_list)

In [None]:
chunk_list[0]

In [None]:
# find the most commonly occuring requests
# we're looking for ones incidental to homelessness 
# req_type = df['Request Type'].value_counts()


In [None]:
# generate easier to read labels
req_type_xlab = ['Bulky Items', 'General Cleaning', 'Encampment Reports', 
                 'Request for Service', 'Human Waste', 'Graffiti on Pole', 
                 'Graffiti on Commercial Building', 'Damaged Parking Meter', 
                 'Abandoned 4 Door Car', 'Pavement Defect']
# color the columns so that calls relating to homelessness are more noticeable 
req_type_colors = ['grey', 'grey', '#1fa898', 'grey', 'orange', 'grey', 'grey', 'grey', 'grey', 'grey']
req_type_alpha = [0.5, 0.5, 1, 0.5, 1, 0.5, 0.5, 0.5, 0.5]

# style the plot to make it look nicer
ax = req_type.head(10).plot(kind='bar', figsize=(8, 6), width=0.75, alpha=1, 
                 color=req_type_colors, edgecolor='white', zorder=2)

ax.yaxis.grid(True, ls=':')
ax.set_xticklabels(req_type_xlab, rotation=45, rotation_mode='anchor', ha='right')

ax.set_title('15 Most Common Request Types')
ax.set_ylabel('Number of calls')
ax.set_xlabel('Request Type')
plt.show()

## Most common call types

It is worth looking at the most common call types to see where the two request types that we will be looking at (human waste and encampments) line up. This can help us get an idea of how many of these calls are coming in relative to the total amount of calls coming in. 

In [None]:
human_waste = df[(df['Request Type'] == 'Human Waste')]

In [None]:
encampments = df[(df['Request Type'] == 'Encampment Reports')]

In [None]:
hw_yearly = human_waste.groupby(human_waste.index.year).size()
enc_yearly = encampments.groupby(encampments.index.year).size()
calls_yearly = df.groupby(df.index.year).size()
prop_hw_yearly = (hw_yearly / calls_yearly) * 100
prop_enc_yearly = (enc_yearly / calls_yearly) * 100

ax = prop_hw_yearly.plot(kind='line', figsize=(10, 6), lw=2, c='#1fa898',
                    marker='^', markerfacecolor='w', markeredgewidth=1.5,
                        label='Percentage regarding human waste', markersize=12)
ax = prop_enc_yearly.plot(kind='line', figsize=(10, 6), lw=2, c='orange',
                    marker='o', markerfacecolor='w', markeredgewidth=1.5, 
                          label='Percentage regarding encampments')

ax.set_xlabel('Percentage of Total Calls about Human Waste, by Year')
ax.set_xlim(2007.9,2018.1)
ax.set_ylabel('Year')
ax.set_title('Percent Calls about Human Waste')

ax.grid(ls=':')
ax.legend()
plt.show()

In [None]:
share = pd.DataFrame([prop_hw_yearly, prop_enc_yearly], index=['human waste', 'encampments']).T

In [None]:
ax = share.plot(figsize=(10, 6), kind='bar', alpha=0.7, stacked=True, 
                       title='Share of calls related to homelessness, by year')
ax.set_xticklabels(share.index, rotation=45, rotation_mode='anchor', ha='right')
plt.show()