# Data Analysis Task

In this doucment, we will look at calls and arrests from the police department at Berkeley. You will be performing data analysis and some basic modeling. **Please take notes of your finding in markdown and chart snapshots**. The notebook should be readable by others. During the analysis, you might find it helpful to check the other notebook for reference.

In [None]:
from midas import Midas
import numpy as np

# fill in here, e.g., "experiment_12"
pid = ...

m = Midas(pid, "eval_berkeley_police")

## Data Description

The data is from [City of Berkeley](https://data.cityofberkeley.info/Public-Safety/Berkeley-PD-Calls-for-Service/k2nh-s5h5). Of the calls to the police during the month of July 2019.

column | description | data type
------ | ----------- | ---------
`CASENO`  | Case Number | Number
`OFFENSE` | Offense Type | Plain Text
`EVENTDT` | Date Event Occurred | Date Time
`EVENTTM` | Time Event Occurred | Plain Text
`CVLEGEND` | Description of Event | Plain Text
`CVDOW` | Day of Week Event Occurred: 0 = Sunday 1 = Monday, etc. 6 = Saturday | Number
`Block_Location` | the area the call is concered with | Plain Text
`Lat` | latitutde of the location related to the call, contains null values | Number
`Lon` | longitutde of the location, contains null values | Number
`BLKADDR` | area address | Plain Text

In [None]:
calls_df = m.from_file("./data/berkeley_calls_july_first_half.csv")
calls_df.head(3)

### Task 1a: identify common weekend offenses

**Please identify the top two types of offenes (`CVLEGEND`) on the weekends (`CVDOW`)**---you can use dataframes to answer this question. If you are using Midas, you might find the API `.vis(selection_type="multiclick")` helpful. Don't forget to record the charts that support your findings!

In [None]:
m.log_entry("startQ1a")

### Task 1b: verify the result holds on another dataset

We now have a new dataset below. Please verify if your previous observations still hold. You might find it helpful to copy the code of your analysis result by clicking on "copy code to clipboard" from the distribution of `cvlegends`.

In [None]:
m.log_entry("startQ1b")

In [None]:
calls_fall_df = m.from_file("./data/berkeley_calls_july_second_half.csv")
calls_fall_df.head(3)

### Task 2a: identify factors with skewed locations

Here we have Plot with the `Folium` library, which we provide the stub code for---we use the `Lat` and `Lon` columns (extracted from `Block_Location`).

You might find Midas *reactive cells* helpful for slicing the data into different subsets. You can also manually copy-paste the cells if you wish.

**Please limit your time to 10 minutes**.

In [None]:
m.log_entry("startQ2a")

In [None]:
locs = calls_df.select(["Lat", "Lon"])
locs.plot_heatmap()

### Task 2b: describe what you have analyzed and what you have not

Please try to recall anlysis you have performed and comment on those that you have not looked at. You may find looking at past selections in the cells, or by using `m.all_selections`. This is important for documentation---this helps differentiate insights that are absent from those non-existent.

In [None]:
m.log_entry("startQ2b")

### Task 3: create recommendations and basic models for the incidents

The police department at Berkeley asks you for recommendations for how to dispatch their officers, both in terms of how many, when, and where. You might need to perform exploratory data analysis, run statistics, and model relevant values (e.g. counts, frequency, kinds of calls etc.). Please record your reasoning with markdown text and relevant charts. <font color="gray">We understand that you may not have enough background on public safety, but you can still learn something from the data, and when applicable, state your bias and possible mis-information.</font>


<font color="gray">Here are some concrete questions to get started:
* You have looked at location in previous analysis and day of the week, what about the hour of the day (`EVENTTM`)?
* Was there any outlier days (`EVENTDT`)?
* Could you group together types of offense based on whether they can benefit from the presence of police? (and maybe re-use your analysis for a different group?)
* You can model the counts of the calls per day based on the features, and use `calls_df` for the modeling and then `calls_fall_df` for verification---the verification could be confirming that certian trends are the same, or you can even do explicit inference on a certain value and see how different you are.</font>


In [None]:
m.log_entry("startQ3")

## Submit your results

Please run the cell below

In [None]:
%%javascript

save_midas_data()