# Exploratory Data Analysis - Crime Data Los Angeles

This notebook is used to explore the questions for the Data Scientist Exercise in July 2020.

- How many crimes were reported over the past 5 years?
- List the top five reported crimes for each year for the past 5 years.
- What are the most common MO codes? Have these changed over the past 5 years?
- How else has reported crime changed over time in the City?
- Based on your analysis, please share any changes to services, programs, or policies that the City should consider.

##  Load libraries

In [10]:
# Import libraries
import os, sys, subprocess
import json
import pandas as pd
import numpy as np
from langdetect import detect
import pickle

pd.set_option('display.max_colwidth', -1)

pd.set_option('display.max_rows', 1000)
#pd.set_option('display.max_columns', 500)

In [14]:
# load project config
terminal_call = ! git rev-parse --show-toplevel
repo_path=terminal_call[0]
project_config_path = os.path.join(repo_path,'project_config.json')

with open(project_config_path,'r') as fp: 
    project_config = json.load(fp)

In [12]:
# import custom module to look at trends
module_path = os.path.join(repo_path,project_config['project_module_relative_path'])
sys.path.append(module_path)

import trends
from trends import get_top_trends

from importlib import reload

# Load in data

In [17]:
# load in data that was collected
df = pickle.load(open("../data/crime_data.pkl", "rb"))

## How many crimes were reported over the past 5 years?

In [15]:
# Check how many crimes were reported over the past 5 years
# Calculate current date and offset by 5 years
(df["date_rptd"] > (pd.datetime.now()- pd.DateOffset(years=5))).sum()

1117864

### A: 1,117,864 crimes were reported in the past 5 years

## List the top five reported crimes for each year for the past 5 years.

In [16]:
# Select the last 5 years as a dataframe
df_5 = df[df["date_rptd"] > (pd.datetime.now()- pd.DateOffset(years=5))]

In [None]:
df_5['year'] = pd.DatetimeIndex(df_5['date_rptd']).year

In [None]:
top_5 = df_5.groupby("year")["crm_cd_1"].value_counts(ascending = False)

In [None]:
top_5.groupby('year').nlargest(5).reset_index(level=1, drop=True)

## What are the most common MO codes? Have these changed over the past 5 years?

In [None]:
df_5["mocodes"].value_counts(ascending = False).head(10)

In [None]:
df_5.dtypes

In [None]:
top_5_mo = df_5.groupby("year")["mocodes"].value_counts(ascending = False)

In [None]:
top_5_mo.groupby('year').nlargest(5).reset_index(level=1, drop=True)

## Pickle for later use

In [None]:
# save data for later use
pickle.dump(df_5, open("./data/crime_data_5.pkl", "wb"))