In [1]:
import pandas as pd
import requests

# Analysis of Crime in Los Angeles

<img src="Images/crime.jpg" width="500"/>

## Research Question

Can we accurately predict the severity of a crime based on time (primarily), location, and victim age?

## Getting Data

- LA crime data 2010 to Present (downloaded csv from data.lacity.org)
- Sunrise and sunset data (JSON pulled from api.sunrise-sunset.org)


In [2]:
pd.read_csv("crime_data.csv").head(3)

Unnamed: 0,DR Number,Date Reported,Date Occurred,Time Occurred,Area ID,Area Name,Reporting District,Crime Code,Crime Code Description,MO Codes,...,Weapon Description,Status Code,Status Description,Crime Code 1,Crime Code 2,Crime Code 3,Crime Code 4,Address,Cross Street,Location
0,151521112,11/04/2015,11/03/2015,2230,,N Hollywood,1555.0,330.0,BURGLARY FROM VEHICLE,0344,...,,IC,Invest Cont,330.0,,,,11100 CAMARILLO ST,,"(34.1577, -118.3727)"
1,151521113,11/04/2015,10/30/2015,200,,N Hollywood,1548.0,330.0,BURGLARY FROM VEHICLE,0344 1609 1307,...,,IC,Invest Cont,330.0,,,,11100 CHANDLER BL,,"(34.1681, -118.3724)"
2,151521117,11/04/2015,11/04/2015,1400,,N Hollywood,1506.0,930.0,CRIMINAL THREATS - NO WEAPON DISPLAYED,0421,...,VERBAL THREAT,JA,Juv Arrest,930.0,,,,7300 BAKMAN AV,,"(34.203, -118.3779)"


In [3]:
requests.get(
    "https://api.sunrise-sunset.org/json?lat=34.073851&lng=-118.242147&date=2019-03-18&formatted=0"
).json()

{'results': {'sunrise': '2019-03-18T13:58:26+00:00',
  'sunset': '2019-03-19T02:03:16+00:00',
  'solar_noon': '2019-03-18T20:00:51+00:00',
  'day_length': 43490,
  'civil_twilight_begin': '2019-03-18T13:33:33+00:00',
  'civil_twilight_end': '2019-03-19T02:28:09+00:00',
  'nautical_twilight_begin': '2019-03-18T13:04:27+00:00',
  'nautical_twilight_end': '2019-03-19T02:57:16+00:00',
  'astronomical_twilight_begin': '2019-03-18T12:35:03+00:00',
  'astronomical_twilight_end': '2019-03-19T03:26:40+00:00'},
 'status': 'OK'}

## Cleaning Data

Filtering:
- Keep only data from 2016

Mapping:
- Converting sunrise times from IS0 8601 UTC to 24 hour PST
- Convert the day the crime occurred to a day of the week (Sunday - Saturday)
- Convert the day the crime occurred to a day of the year (0 - 365)

Computed Values:
- Calculated the days until the crime was reported from date occurred and date reported

Conversion:
- Converted location into two columns of latitude and longitude

Merging:
- Used the sunrise/sunset data to determine the time of day (DAY/NIGHT)

In [4]:
pd.read_csv("sunset_data.csv").head(3)

Unnamed: 0,date,sunrise,sunset
0,01/01/2016,658,1654
1,01/02/2016,658,1655
2,01/03/2016,658,1656


In [5]:
pd.read_csv("crime_data_edited.csv").head(3)

Unnamed: 0,Time Occurred,Crime Code,Victim Age,Day Occurred,Days To Report,Day of Year,Time of Day,Location Lat,Location Lng
0,230,626.0,25.0,Monday,0,150,NIGHT,34.0426,-118.2814
1,1845,626.0,41.0,Sunday,1,149,DAY,34.0698,-118.2528
2,1700,900.0,24.0,Sunday,1,149,DAY,34.0586,-118.2691


## Exploring Data

Three factors and their effect on severity:
- Day of the week
- Time of day
- Days to report

### Crime Severity by Day of the Week

<img src="Images/severity_day.jpeg" width="500"/>

<img src="Images/severity_day_expand.jpeg" width="1000"/>

*The overall average crime severity remained consistent throughout each day of the week. The average crime severity was a crime code around 500 which is about in the middle of the scale 110-957. The distribution of crime codes per day did not change much but it there is a noticeable decrease in crime on Saturday and Sunday, and Friday has the most, specifically more sever crimes than other days.*

### Crime Severity by Time of Day

<img src="Images/severity_time.jpeg" width="500"/>

*Here yellow represents "day", the time between sunrise and sunset, and the blue represents "night". The crime severity is not noticeably different at day or night, staying relatively consistent throughout the whole day.*

<img src="Images/day_night_dist.jpeg" width="1000"/>

*We can see that there were more crimes in the crime code range 300-400 for both night and day. One noticabel difference is the increase in crimes between codes 400-500 during the night. Also, the bump at the low end (below ~300) gives credence to the perception that more violent crimes occur at night.*

### Days to Report by Victim Age and Crime Severity

<img src="Images/age_to_report.jpeg" width="500"/>

*The majority of crimes were reported fairly soon after occurance, and there doesn't seem to be aclsoe rleationship between age, days to report, and crime severity.*

## Machine Learning

We used both a k nearest neighbors model that we used in class and a random forest model mentioned after the Kaggle competition.

### K Nearest Neighbors

<img src="Images/kNN_plot.jpeg" width="500"/>

Minimum error of 177.399 at k=97

### Random Forest

<img src="Images/rf_plot.jpeg" width="500"/>

Minimum error of 172.403 at k=97

## Conclusion

**Crime is complicated and predicted the severity of a crime based on predominantly time data is difficult. Even including factors such as location and victim details does not make the predictions bigger. A more extensive data set may be neccessary to more accurately model crime.**

## Questions?