# Exploratory Data Analysis - Crime Data Los Angeles

This notebook is used to explore the questions for the Data Scientist Exercise in July 2020.

- How many crimes were reported over the past 5 years?
- List the top five reported crimes for each year for the past 5 years.
- What are the most common MO codes? Have these changed over the past 5 years?
- How else has reported crime changed over time in the City?
- Based on your analysis, please share any changes to services, programs, or policies that the City should consider.

##  Load libraries

In [23]:
# Import libraries
import os, sys, subprocess
import json
import pandas as pd
import numpy as np
from langdetect import detect
import pickle

pd.set_option('display.max_colwidth', -1)
#pd.set_option('display.max_columns', 500)

In [24]:
# load project config
terminal_call = ! git rev-parse --show-toplevel
repo_path=terminal_call[0]
project_config_path = os.path.join(repo_path,'project_config.json')

with open(project_config_path,'r') as fp: 
    project_config = json.load(fp)

In [25]:
# import custom module to look at trends
module_path = os.path.join(repo_path,project_config['project_module_relative_path'])
sys.path.append(module_path)

import trends
from trends import get_top_trends as gt
from trends import convert
from trends.convert import crime_dict, mocode_dict # load in crime name dictionary
from importlib import reload # for updating scripts

# Load in data

In [31]:
# load in data that was collected
df = pickle.load(open("../data/crime_data.pkl", "rb"))

## How many crimes were reported over the past 5 years?

In [6]:
# Check how many crimes were reported over the past 5 years
# Calculate current date and offset by 5 years
(df["date_rptd"] > (pd.datetime.now()- pd.DateOffset(years=5))).sum()

1117864

### A: 1,117,864 crimes were reported in the past 5 years

## List the top five reported crimes for each year for the past 5 years.

In [None]:
reload(trends)

### Subset data for the last 5 years

In [33]:
# Select the last 5 years as a dataframe
df_5 = df[df["date_rptd"] > (pd.datetime.now()- pd.DateOffset(years=5))]

In [34]:
# Find the top 5 reported crimes for each year
# Crime is listed in the data under 'crm_cd'
# Custom function 'top_trends' groups by column and selects the top 'n' 
top_crimes = gt.top_trends(df_5, column='year', variable='crm_cd', n=5)

In [35]:
# Add in the name of the crimes to dataframe
top_crimes["crm_name"] = convert.get_names(top_crimes, "crm_cd", crime_dict)

In [36]:
top_crimes

Unnamed: 0,year,crm_cd,count,crm_name
0,2015,624,8853,Battery - misdemeanor
1,2015,510,8573,Stolen Vehicle
2,2015,440,7866,Theft - $950 & under
3,2015,330,7316,Burg from Vehicle
4,2015,354,7297,Theft of Identity
131,2016,510,18353,Stolen Vehicle
132,2016,624,17942,Battery - misdemeanor
133,2016,330,16779,Burg from Vehicle
134,2016,440,14814,Theft - $950 & under
135,2016,310,14558,Burglary


## What are the most common MO codes? Have these changed over the past 5 years?

In [38]:
# Find the top 10 most common MO codes across all the years (2010-2020)
df_5["mocodes"].value_counts(ascending = False).head(10)

0344         93842
0329         44407
1501         18472
0325         12737
0416         12317
0329 1300    8302 
1822         8234 
0344 1300    6090 
0344 1606    5898 
0329 1307    4808 
Name: mocodes, dtype: int64

In [39]:
# Find the top 10 MO codes for each year
# MO code is listed in the data under 'mocodes'
# Custom function 'top_trends' groups by column and selects the top 'n' 
top_mo = gt.top_trends(df_5, column='year', variable='mocodes_1', n=10)

In [40]:
# Add in the name of the MO codes to dataframe
top_mo["mo_name"] = convert.get_names(top_mo, "mocodes_1", mocode_dict)

In [41]:
top_mo

Unnamed: 0,year,mocodes_1,count,mo_name
0,2015,344,24853,Removes vict property
1,2015,0,12010,
2,2015,329,9307,Vandalized
3,2015,2000,5622,Domestic violence
4,2015,416,5152,Hit-Hit w/ weapon
5,2015,325,2820,Took merchandise
6,2015,1822,2775,Stranger
7,2015,400,2727,Force used
8,2015,1501,2473,Other MO (see rpt)
9,2015,100,2095,Suspect Impersonate
