# Data Analysis Task

In this doucment, we will look at calls and arrests from the police department at Berkeley. You will be performing data analysis and some basic modeling. Please take careful notes of your work and make sure that the notebook can be shared with and understood by others.

In [None]:
from midas import Midas
import numpy as np

# fill in here, e.g., "experiment_9"
pid = ...

m = Midas(pid, "eval_berkeley_police")

## Data Description

The data is from [City of Berkeley](https://data.cityofberkeley.info/Public-Safety/Berkeley-PD-Calls-for-Service/k2nh-s5h5). Of the calls to the police during the month of July 2019.

column | description | data type
------ | ----------- | ---------
`CASENO`  | Case Number | Number
`OFFENSE` | Offense Type | Plain Text
`EVENTDT` | Date Event Occurred | Date Time
`EVENTTM` | Time Event Occurred | Plain Text
`CVLEGEND` | Description of Event | Plain Text
`CVDOW` | Day of Week Event Occurred: 0 = Sunday 1 = Monday, etc. 6 = Saturday | Number
`InDbDate` |Date dataset was updated in the portal |Date Time
`Lat` | latitutde of the location related to the call, contains null values | Number
`Block_Location` | the area the call is concered with | Plain Text
`BLKADDR` | area address | Plain Text
`City` | Berkeley | Plain Text
`State` | CA | Plain Text
`Lon` | longitutde of the location, contains null values | Number

In [None]:
calls_df = m.from_file("./data/berkeley_calls_july_first_half.csv")
calls_df.head(3)

### Task 1a: Common weekend offenses

**Please identify the top two types of offenes (`CVLEGEND`) on the weekends (`CVDOW`)**---you can use dataframes to answer this question. If you are using Midas, you might find the API `.get_filtered_data()` helpful.

### Task 1b: verify the result holds on different dataset

We now have a new dataset below. Please verify if your previous observations still hold. You might find it helpful to copy the code of your analysis result by clicking on "copy code to clipboard" from the distribution of `cvlegends`.

In [None]:
calls_fall_df = m.from_file("./data/berkeley_calls_july_second_half.csv")
calls_fall_df.head(3)

### Task 2a: find any location skews (`block_location`) by other factors

Plot with the `Folium` library, which we provide the stub code for. You might find Midas reactive cells helpful for slicing the data into different subsets. **Please limit your time to 10 minutes**.

In [None]:
import folium
import folium.plugins

locs = calls_df.where('Lat', m.are.above(1)).select(['Lat', 'Lon']).to_numpy()
CENTER_COORD = (37.8715, -122.2730) # berkeley city center coordinate

us_map = folium.Map(location=CENTER_COORD, zoom_start=12)
heatmap = folium.plugins.HeatMap(locs.tolist(), radius = 10)
us_map.add_child(heatmap)

### Task 2b: describe what you have analyzed and what you have not

By looking at past selections in the cells, or by using `m.all_selections`. This is important for documentation---this helps differentiate insights that are absent from those non-existent.

### Task 3: model incidents

The police department at Berkeley asks you for recommendations for how to station their officers. We understand that you may not have enough background on how public safety/police works, but please try it out and state sources of bias and possible mis-information. Please do some free form exploratory data analysis, and maybe some statistics and modeling as well.

We have provided an additional data source below of arrests from the berkeley city open data platform, in case you might find it useful.

**Please just use `calls_df` for this task, and wrap up by 30 min.**

In [None]:
from midas.util.utils import fetch_and_cache

In [None]:
arrest_url = "https://data.cityofberkeley.info/resource/xi7q-nji6.csv"
arrest_path = fetch_and_cache(arrest_url, "berkeley_arrests.csv", force = False)

In [None]:
# this data, the berkeley website has given no descriptions for
arrest_df = m.from_file(arrest_path)
arrest_df.head(3)

In [None]:
# some helper functions
from datetime import datetime
fun = lambda e: datetime.strptime(e, "%Y-%m-%dT%H:%M:%S.%f")
arrest_df['parsed_dt'] = arrest_df.apply(fun, 'date_and_time')
arrest_df.head(2)