# Assignment 2

## Formalia:

Please read the [assignment overview page](https://github.com/suneman/socialdata2023/wiki/Assignments) carefully before proceeding. This page contains information about formatting (including formats etc.), group sizes, and many other aspects of handing in the assignment. 

_If you fail to follow these simple instructions, it will negatively impact your grade!_

**Due date and time**: The assignment is due on Monday March 27th, 2023 at 23:55. Hand in your files via [http://peergrade.io](http://peergrade.io/). If you're not already a peergrade user, [you can use this link to sign up]( 
https://app.peergrade.io/join/44E47G) - **PLEASE USE YOUR DTU EMAIL WHEN YOU SIGN UP**.

**Peergrading date and time**: \[OPTIONAL FOR ASSIGNMENT 2\] _Remember that after handing in you MAY evaluate a few assignments written by other members of the class_. (Should you choose to do this, the deadline is Tuesday April 11 at noon). 

## A2: A short data story

This assignment is to create a short data-story based on the work we've done in class so far. See **Exercises Week 8, Part 2** for full details on how the story should be constructed.

In [1]:
# Imports
import pandas as pd

In [2]:
# Upload data
data = pd.read_csv("Police_Department_Incident_Reports__Historical_2003_to_May_2018.csv")

# Create column 'Year'
data['Date'] = pd.to_datetime(data['Date'])
data['Year'] = data['Date'].dt.year

# Delete rows of year 2018
data = data[data['Year'] <2018]

# Create column 'HourOfDay'
data['HourOfDay'] = [(i[:2]) for i in data['Time']]

### Let's compare PROSTITUTION with DRUG/NARCOTIC

In [3]:
# Use the pivot_table() method to count incidents in each category and subcategory
year_cat = data.pivot_table(index='Year', columns='Category', aggfunc='size', fill_value=0)
hour_cat = data.pivot_table(index='HourOfDay', columns='Category', aggfunc='size', fill_value=0)
week_cat = data.pivot_table(index='DayOfWeek', columns='Category', aggfunc='size', fill_value=0)

# Select only DRUG/NARCOTIC and PROSTITUTION
year_cat = year_cat.drop(list(year_cat.columns[:7]) + list(year_cat.columns[8:22]) +list(year_cat.columns[23:]), axis=1)
hour_cat = hour_cat.drop(list(hour_cat.columns[:7]) + list(hour_cat.columns[8:22]) +list(hour_cat.columns[23:]), axis=1)
week_cat = week_cat.drop(list(week_cat.columns[:7]) + list(week_cat.columns[8:22]) +list(week_cat.columns[23:]), axis=1)

# Normalise to compare
normalized_year_cat = year_cat/year_cat.sum(axis=0)
normalized_hour_cat = hour_cat/hour_cat.sum(axis=0)
normalized_week_cat = week_cat/week_cat.sum(axis=0)


# Create the 'year' column (instead of having it only as the index of the dataframe)
normalized_year_cat['year'] = [str(i) for i in normalized_year_cat.index]
normalized_hour_cat['Hour'] = list(normalized_hour_cat.index)
normalized_week_cat['Day'] = list(normalized_week_cat.index)

In [4]:
### Bokeh

# If we want to show it in the notebook
from bokeh.io import output_notebook # , vplot
#output_notebook()

# Imports for the plots
from bokeh.palettes import Spectral3
from bokeh.transform import factor_cmap

from bokeh.plotting import figure, output_file, show
#output_file("assignment2-bokeh.html")

ModuleNotFoundError: No module named 'bokeh'

Figure 1 (Year)

In [0]:
# it is a standard way to convert your df to bokeh
from bokeh.models import ColumnDataSource, FactorRange
source = ColumnDataSource(normalized_year_cat) 

### Add the figure p1

p1 = figure(x_range = FactorRange(factors=list(normalized_year_cat.year)), x_axis_label='Year', y_axis_label='Frequency', 
           title='Frequency of Crimes per Year', width=1000, height=750) 


### Add the bars
bar ={} # to store vbars
focuscrimes = set(['PROSTITUTION', 'DRUG/NARCOTIC'])
colors = ['#1f77b4', '#ff7f0e']

### here we will do a for loop (we will create a vbar for each focuscrime):
for indx, i in enumerate(focuscrimes):
    bar[i] = p1.vbar(x='year', source=source, top=i, legend_label=i, 
                    width = 0.85,
                    fill_color=colors[indx],
                    line_color="black",
                    line_width=2,
                    alpha=0.75,
                    muted_alpha=0.03)


### Maybe, change the legend position


### Show and display
p1.legend.click_policy="mute" ### assigns the click policy (you can try to use 'hide', in that case is like having muted_alpha=0)

show(p1) #displays your plot


Figure 2 (Week)

In [None]:
# it is a standard way to convert your df to bokeh
source = ColumnDataSource(normalized_week_cat) 

### Add the figure p1

p2 = figure(x_range = FactorRange(factors=list(normalized_week_cat.index)), x_axis_label='Day', y_axis_label='Frequency', 
           title='Frequency of Crimes per Day of Week', width=1000, height=750) 


### Add the bars
bar ={} # to store vbars
focuscrimes = set(['PROSTITUTION', 'DRUG/NARCOTIC'])
colors = ['#1f77b4', '#ff7f0e']

### here we will do a for loop (we will create a vbar for each focuscrime):
for indx, i in enumerate(focuscrimes):
    bar[i] = p2.vbar(x='Day', source=source, top=i, legend_label=i, 
                    width = 0.85,
                    fill_color=colors[indx],
                    line_color="black",
                    line_width=2,
                    alpha=0.75,
                    muted_alpha=0.03)


### Maybe, change the legend position


### Show and display
p2.legend.click_policy="mute" ### assigns the click policy (you can try to use 'hide', in that case is like having muted_alpha=0)

show(p2) #displays your plot

Figure 3 (Hour)

In [0]:
source = ColumnDataSource(normalized_hour_cat)

### Add the figure

p3 = figure(x_range = FactorRange(factors=list(normalized_hour_cat.index)), x_axis_label='Hour', y_axis_label='Frequency', 
           title='Crimes per hour', width=1000, height=750) 


### Add the bars
bar ={} # to store vbars
focuscrimes = set(['PROSTITUTION', 'DRUG/NARCOTIC'])
colors = ['#1f77b4', '#ff7f0e']

### here we will do a for loop (we will create a vbar for each focuscrime):
for indx, i in enumerate(focuscrimes):
    bar[i] = p3.vbar(x='Hour', source=source, top=i, legend_label=i, 
                    width = 0.85,
                    #fill_color=factor_cmap('Hour', palette=Spectral3, factors=list(focuscrimes), start=0, end=13),
                    fill_color=colors[indx],
                    line_color="black",
                    line_width=2,
                    alpha=0.75,
                    muted_alpha=0.03)  #  muted = ....) 



### Maybe, change the legend position


### Show and display
p3.legend.click_policy="mute" ### assigns the click policy (you can try to use 'hide', in that case is like having muted_alpha=0)

show(p3) #displays your plot

All the figures

In [0]:
# put all the plots in a VBox
#p = vplot(p1, p3) #, p3)  # for now it does not work vplot

# show the results
#show(p)

### Map plot

> I don't know why there are only 3 dots for prostitution.

In [12]:
#Plot San Francisco map 
import folium
m1 = folium.Map([37.77919, -122.41914],tiles="Stamen Toner", zoom_start=14)

focuscrimes = set(['PROSTITUTION', 'DRUG/NARCOTIC'])
colors = ['#1f77b4', '#ff7f0e']

def plotDot(point, color, year, radius):
    '''input: series that contains a numeric named latitude and a numeric named longitude
    this function creates a CircleMarker and adds it to your this_map'''
    folium.CircleMarker(location=point,
                        radius=radius,
                        weight=5,
                        color=color,
                        popup=f"<i>{year}</i>").add_to(m1)




for i, crime in enumerate(focuscrimes):

    location = data[data['Category']== crime].groupby(['X', 'Y']).size()
    location = location.sort_values(ascending=False).reset_index()
    color=colors[i]

    tooltip = crime
    #print(f"The most common location for {year} was {[location['Y'][0],location['X'][0]]}, the number of crimes was {location[0][0]}. Number of different locations: {len(location)}")
    
    for idx in range(5):
        #print top 5 places
        plotDot([location['Y'][idx],location['X'][idx]], color, crime, 2+i)
        print([location['Y'][idx],location['X'][idx]])
    
    #folium.Marker(
    #    [location['Y'][0],location['X'][0]] , popup=f"<i>{crime}</i>", tooltip=tooltip
    #).add_to(m1)



m1

[37.7642205603745, -122.41965834371]
[37.775420706711, -122.403404791479]
[37.7833862379382, -122.409853729941]
[37.7827931071006, -122.414056291891]
[37.7817511307229, -122.411071423064]
[37.7604330003754, -122.415929849548]
[37.7844496612562, -122.416075285059]
[37.7844496612791, -122.416075285051]
[37.7636337703031, -122.416230392551]
[37.7636337702985, -122.416230392543]


In [10]:
location

Unnamed: 0,X,Y,0
0,-122.415930,37.760433,814
1,-122.416075,37.784450,640
2,-122.416075,37.784450,539
3,-122.416230,37.763634,514
4,-122.416230,37.763634,495
...,...,...,...
1443,-122.418951,37.777923,1
1444,-122.413330,37.764005,1
1445,-122.478350,37.751334,1
1446,-122.419000,37.792700,1


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=0bd28265-4ef9-4244-9130-dfeb6e6916c5' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>