# Data Analysis Task

In this doucment, we will look at calls and arrests from the police department at Berkeley. You will be performing data analysis and some basic modeling. **Please take notes of your finding in markdown and chart snapshots**. The notebook should be readable by others. During the analysis, you might find it helpful to check the other notebook for reference.

In [None]:
from midas import Midas
import numpy as np
from datetime import datetime 

# fill in here, e.g., "experiment_12"
pid = ...

m = Midas(pid, "eval_berkeley_police")

## Data Description

The data is from [City of Berkeley](https://data.cityofberkeley.info/Public-Safety/Berkeley-PD-Calls-for-Service/k2nh-s5h5). Of the calls to the police during the month of July 2019. Each row contains a phone call to the police department, with the date, the time, the description of the event, and the location. It contains the first half of July.

column | description | data type
------ | ----------- | ---------
`CASENO`  | Case Number | Number
`OFFENSE` | Offense Type | Plain Text
`EVENTDT` | Date Event Occurred | Date Time
`EVENTTM` | Time Event Occurred | Plain Text
`CVLEGEND` | Description of Event | Plain Text
`CVDOW` | Day of Week Event Occurred: 0 = Sunday 1 = Monday, etc. 6 = Saturday | Number
`Block_Location` | the area the call is concered with | Plain Text
`Lat` | latitutde of the location related to the call, contains null values, regexed from `Block_Location` | Number
`Lon` | longitutde of the location, contains null values, regexed from `Block_Location` | Number
`BLKADDR` | area address | Plain Text

In [None]:
calls_df = m.from_file("./data/berkeley_calls_july_first_half.csv")
calls_df.head(3)

### Task 1a: identify common weekend offenses (~10 min)

What are the top two types of offenes on the weekends?

<font color="gray">Hint: `.vis(selection_type="multiclick")` can change a selection from brush to multiple click.</font>

In [None]:
m.log_start_task("Q1a")

### Task 1b: verify the result holds on another dataset (~5 min)

We now have a new dataset from the second half of July, please verify if your previous observations still hold, that is, are the categories of offense you found still the most popular?

<font color="gray">Hint: clicking on "copy code to clipboard" could give you the code the result was derived from.</font>

In [None]:
m.log_start_task("Q1b")

In [None]:
calls_fall_df = m.from_file("./data/berkeley_calls_july_second_half.csv")
calls_fall_df.head(3)

### Task 2a: does the area offenses differ by type? (~15 min)

Please observe interesting facts related to locations. Does downtown Berkeley have more crimes? What about on campus? Are there more crimes of a certain type? Can you generalize the types of crime with types of areas based on your knowledge of the city?

<font color="gray">Hint: `plot_heatmap` plots a heat map give `Lat` and `Lon`. Reactive cells with `%%reactive` are re-ran after each chart selection.</font>

In [None]:
m.log_start_task("Q2a")

In [None]:
locs = calls_df.select(["Lat", "Lon"])
locs.()

### Task 2b: describe what you have analyzed and what you have not (~5 min)

A teammate is going to take over Task 2a from now, please let them know you have looked at so they can cover the other analysis!

<font color="gray">Hint: You may find looking at past selections in the cells, or by using `m.all_selections`.</font>

In [None]:
m.log_start_task("Q2b")

### Task 3: create recommendations and basic models for the incidents  (~30 min)

Might you have observatiosn and recommendations to the police department at Berkeley based on this data?

You can start with some basics observations, such as what areas are more densely populated during the day and what during evening times, and whether holidays (e.g., July 4th) has an effect. Then later you can model the data and present quantative numbers, such as X% of officers are expected to be downtown, with a Y% hike on the weekend evenings. If you want to build a machine learning model, e.g., to predict the type of offense based on location and time, feel free!

<font color="gray">We understand that you may not have enough background on public safety, but you can still learn something from the data, and when applicable, state your bias and possible mis-information.</font>

In [None]:
m.log_start_task("Q3")

## Submit your results

Please run the cell below

In [None]:
%%javascript

downloadMidasData()