## DATASCI 207: Machine Learning
#### Project: Crime Classification Model for Community Policing

**Dataset**: Crime Data from 2020 to Present
**Source**: https://catalog.data.gov/dataset/crime-data-from-2020-to-present

#### Project Description:
This project develops a machine learning model to classify crimes as violent 
or non-violent based on feature analysis of crime data. The model aims to 
support data-driven community policing strategies and resource allocation.

#### Team Members:
- Kadin Wilkins
- Matilda Orona
- Vikram Magal
- Anushka Vazirani

In [1]:
import pandas as pd
import numpy as np

In [11]:
# Read data
df = pd.read_csv('data/Crime_Data_from_2020_to_Present.csv')

In [3]:
# Display first 5 rows
df.head()

Unnamed: 0,DR_NO,Date Rptd,DATE OCC,TIME OCC,AREA,AREA NAME,Rpt Dist No,Part 1-2,Crm Cd,Crm Cd Desc,...,Status,Status Desc,Crm Cd 1,Crm Cd 2,Crm Cd 3,Crm Cd 4,LOCATION,Cross Street,LAT,LON
0,211507896,04/11/2021 12:00:00 AM,11/07/2020 12:00:00 AM,845,15,N Hollywood,1502,2,354,THEFT OF IDENTITY,...,IC,Invest Cont,354.0,,,,7800 BEEMAN AV,,34.2124,-118.4092
1,201516622,10/21/2020 12:00:00 AM,10/18/2020 12:00:00 AM,1845,15,N Hollywood,1521,1,230,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",...,IC,Invest Cont,230.0,,,,ATOLL AV,N GAULT,34.1993,-118.4203
2,240913563,12/10/2024 12:00:00 AM,10/30/2020 12:00:00 AM,1240,9,Van Nuys,933,2,354,THEFT OF IDENTITY,...,IC,Invest Cont,354.0,,,,14600 SYLVAN ST,,34.1847,-118.4509
3,210704711,12/24/2020 12:00:00 AM,12/24/2020 12:00:00 AM,1310,7,Wilshire,782,1,331,THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND ...,...,IC,Invest Cont,331.0,,,,6000 COMEY AV,,34.0339,-118.3747
4,201418201,10/03/2020 12:00:00 AM,09/29/2020 12:00:00 AM,1830,14,Pacific,1454,1,420,THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER),...,IC,Invest Cont,420.0,,,,4700 LA VILLA MARINA,,33.9813,-118.435


In [4]:
# Look at unique Crime Descriptions where Weapon Used Code is NULL
df[df['Weapon Used Cd'].isnull()]['Crm Cd Desc'].unique()

array(['THEFT OF IDENTITY',
       'THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)',
       'THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER)',
       'VEHICLE - STOLEN', 'BURGLARY', 'BURGLARY FROM VEHICLE',
       'THEFT PLAIN - PETTY ($950 & UNDER)',
       'VANDALISM - MISDEAMEANOR ($399 OR UNDER)',
       'VEHICLE - ATTEMPT STOLEN',
       'VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)',
       'FIREARMS RESTRAINING ORDER (FIREARMS RO)', 'BIKE - STOLEN',
       'EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)',
       'THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD',
       'LETTERS, LEWD  -  TELEPHONE CALLS, LEWD',
       'VIOLATION OF COURT ORDER', 'ARSON',
       'VIOLATION OF RESTRAINING ORDER', 'CONTEMPT OF COURT',
       'OTHER MISCELLANEOUS CRIME', 'TRESPASSING', 'BUNCO, ATTEMPT',
       'SHOPLIFTING - PETTY THEFT ($950 & UNDER)', 'BURGLARY, ATTEMPTED',
       'DOCUMENT FORGERY / STOLEN FELONY',
       'SHOPLIFTING-GRAND THEFT ($950.01 & OVER)', 'FAILURE T

After exploring the list, some of the values in here look like they would qualify as a "Violent Crime". They are as follows:
- 'CHILD ABUSE (PHYSICAL) - SIMPLE ASSAULT'
- 'INTIMATE PARTNER - SIMPLE ASSAULT'
- 'ROBBERY'
- 'CRIMINAL HOMICIDE'
- 'ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT'
- 'RAPE, FORCIBLE'
- 'MANSLAUGHTER, NEGLIGENT'
- 'BATTERY - SIMPLE ASSAULT'
- 'LYNCHING'

In [5]:
# Look at unique Crime Descriptions where Weapon Used Code is not NULL
df[df['Weapon Used Cd'].notnull()]['Crm Cd Desc'].unique()

array(['ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT',
       'CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 YRS OLDER)',
       'INTIMATE PARTNER - SIMPLE ASSAULT', 'BATTERY - SIMPLE ASSAULT',
       'BURGLARY', 'ROBBERY', 'CHILD ABUSE (PHYSICAL) - SIMPLE ASSAULT',
       'CRIMINAL THREATS - NO WEAPON DISPLAYED',
       'BATTERY WITH SEXUAL CONTACT', 'THEFT, PERSON',
       'BURGLARY FROM VEHICLE', 'INTIMATE PARTNER - AGGRAVATED ASSAULT',
       'VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)',
       'ATTEMPTED ROBBERY', 'LEWD/LASCIVIOUS ACTS WITH CHILD',
       'BRANDISH WEAPON', 'BATTERY POLICE (SIMPLE)', 'ARSON',
       'BIKE - STOLEN', 'RAPE, ATTEMPTED',
       'VANDALISM - MISDEAMEANOR ($399 OR UNDER)',
       'VIOLATION OF COURT ORDER', 'OTHER ASSAULT',
       'THEFT PLAIN - PETTY ($950 & UNDER)',
       'SODOMY/SEXUAL CONTACT B/W PENIS OF ONE PERS TO ANUS OTH',
       'RAPE, FORCIBLE', 'DISCHARGE FIREARMS/SHOTS FIRED',
       'SHOTS FIRED AT MOVING VEHICLE, TRAIN OR

All of these values look like they would qualify as a "Violent Crime".

In [6]:
# Check if DR_NO has any duplicates(This is supposed to be a unique ID)
df['DR_NO'].duplicated().any()

np.False_

In [7]:
# Display the percentage of NULLs in each Column
df.isnull().sum() / len(df) * 100

DR_NO              0.000000
Date Rptd          0.000000
DATE OCC           0.000000
TIME OCC           0.000000
AREA               0.000000
AREA NAME          0.000000
Rpt Dist No        0.000000
Part 1-2           0.000000
Crm Cd             0.000000
Crm Cd Desc        0.000000
Mocodes           15.086603
Vict Age           0.000000
Vict Sex          14.392567
Vict Descent      14.393761
Premis Cd          0.001592
Premis Desc        0.058508
Weapon Used Cd    67.437818
Weapon Desc       67.437818
Status             0.000100
Status Desc        0.000000
Crm Cd 1           0.001095
Crm Cd 2          93.118346
Crm Cd 3          99.769749
Crm Cd 4          99.993632
LOCATION           0.000000
Cross Street      84.652997
LAT                0.000000
LON                0.000000
dtype: float64

In [8]:
# Make a list of crimes that we identified as Violent from one of the previous steps
violent_crimes = ['CHILD ABUSE (PHYSICAL) - SIMPLE ASSAULT', 'INTIMATE PARTNER - SIMPLE ASSAULT', 'ROBBERY', 'CRIMINAL HOMICIDE', 'ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT', 'RAPE, FORCIBLE', 'MANSLAUGHTER, NEGLIGENT', 'BATTERY - SIMPLE ASSAULT', 'LYNCHING']

# Create a new Crime Type column with Violent and Non-Violent labels
df['Crime Type'] = np.where(df['Weapon Used Cd'].notnull() | df['Crm Cd Desc'].isin(violent_crimes), 'Violent', 'Non-Violent')

In [9]:
# Display the newly created labels alongside their corresponding Crime Descriptions
df[['Crm Cd Desc','Crime Type']]

Unnamed: 0,Crm Cd Desc,Crime Type
0,THEFT OF IDENTITY,Non-Violent
1,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",Violent
2,THEFT OF IDENTITY,Non-Violent
3,THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND ...,Non-Violent
4,THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER),Non-Violent
...,...,...
1004986,OTHER MISCELLANEOUS CRIME,Non-Violent
1004987,CHILD NEGLECT (SEE 300 W.I.C.),Non-Violent
1004988,INDECENT EXPOSURE,Non-Violent
1004989,BATTERY - SIMPLE ASSAULT,Violent
