# Crime Rate and Economic Inequality (Group 23)

## STAT 201 Group 23 Project Proposal

Crystal Zhao, Yuhei Arimoto, Tony Lee, Qiantong Huang

## Part 1: Introduction

Crime is a complex issue with various contributing factors. Poverty, economic inequality, and social conflicts such as political exclusion and relative deprivation are some of the identified factors that can lead to the occurrence of crime (Norrie, 2014). Numerous studies have shown a correlation between poverty and crime rates, highlighting the significance of economic inequality as a factor influencing crime rates (Sugiharti et al., 2023; Zhang, 2013).

To expand our understanding of the relationship between crime rates and economic inequality, this project aims to explore the correlation between specific types of crime and economic inequality. We will analyze crime data from the Vancouver Police Department on two Vancouver neighborhoods, Grandview-Woodland and Shaughnessy, to answer two questions: 
* Is the crime rate of Break and Enter Residential/Other type of crime higher in a wealthy neighborhood than in a poor neighborhood?
* Is the crime rate of Offence typo crime lower in a wealthy neighborhood than in a poor neighborhood?

Grandview-Woodland, with a low median household income, will be classified as a poor neighborhood, while Shaughnessy, with a high median household income, will be classified as a wealthy neighborhood ("Grandview-Woodland", 2020; "Shaughnessy", 2020). The dataset includes information on the type, date, and location of all crimes reported in these two neighborhoods, and population values of 8430 and 29175 ("Grandview-Woodland", 2020; "Shaughnessy", 2020), respectively, will be used to calculate crime rates. Crime rate (location parameter) and standard deviation (scale parameter) will be used as response variables in our analysis.


## Part 2: Methods & Results

#### 2.1: Data Exploration

Attach the libraries and set the seed:

In [1]:
# Attach the libraries.
library(tidyverse)
library(tidymodels)
library(cowplot)
library(GGally)
library(RColorBrewer)
set.seed(1000)
options(repr.plot.width = 12, repr.plot.height = 8)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.3.6      [32m✔[39m [34mpurrr  [39m 0.3.4 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.4.1 
[32m✔[39m [34mreadr  [39m 2.1.2      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.1     [32m✔[39m [34mrsample     [39m 1.1.0
[32m✔[39m [34mdials       [39m 1.0.0     [32m✔[39m [34mtune        [39m 1.0.0
[32m✔[39m [34minfer       [39m 1.0.3     [32m✔[39m [34mworkflows   [39m 1.0.0
[

In [2]:
# load Grandview Woodland crime rate dataset from database
grandview_woodland_data <- read_csv("crimedata_csv_Grandview-Woodland_AllYears.csv", col_names = TRUE)
names(grandview_woodland_data) <- tolower(names(grandview_woodland_data))

#Get rid of any rows with NA
grandview_woodland_clean <- grandview_woodland_data |>
           filter_all(all_vars(!is.na(.)))

# View the cleaned dataframe.
glimpse(grandview_woodland_clean)

[1mRows: [22m[34m48142[39m [1mColumns: [22m[34m10[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): TYPE, HUNDRED_BLOCK, NEIGHBOURHOOD
[32mdbl[39m (7): YEAR, MONTH, DAY, HOUR, MINUTE, X, Y

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Rows: 48,142
Columns: 10
$ type          [3m[90m<chr>[39m[23m "Break and Enter Commercial", "Break and Enter Commercia…
$ year          [3m[90m<dbl>[39m[23m 2006, 2006, 2006, 2022, 2007, 2006, 2003, 2004, 2007, 20…
$ month         [3m[90m<dbl>[39m[23m 1, 11, 11, 4, 10, 4, 10, 12, 10, 10, 4, 3, 8, 11, 11, 8,…
$ day           [3m[90m<dbl>[39m[23m 30, 11, 15, 3, 15, 9, 14, 20, 1, 15, 9, 14, 1, 4, 15, 11…
$ hour          [3m[90m<dbl>[39m[23m 16, 16, 16, 0, 4, 23, 18, 4, 17, 4, 17, 5, 17, 7, 1, 5, …
$ minute        [3m[90m<dbl>[39m[23m 30, 0, 30, 0, 9, 30, 0, 26, 0, 40, 30, 21, 0, 23, 15, 20…
$ hundred_block [3m[90m<chr>[39m[23m "10XX CLARK DR", "10XX CLARK DR", "10XX CLARK DR", "10XX…
$ neighbourhood [3m[90m<chr>[39m[23m "Grandview-Woodland", "Grandview-Woodland", "Grandview-W…
$ x             [3m[90m<dbl>[39m[23m 494382.4, 494382.4, 494382.4, 494382.6, 494937.3, 494937…
$ y             [3m[90m<dbl>[39m[23m 5458077, 5458077, 5458077, 5458098, 5458069

Figure 2.1.1: The initial Grandview Woodland dataset with the correct column names and after basic cleaning.

In [3]:
# load Shaughnessy crime rate dataset from database
shaughnessy_data <- read_csv("crimedata_csv_Shaughnessy_AllYears.csv", col_names = TRUE)
names(shaughnessy_data) <- tolower(names(shaughnessy_data))

#Get rid of any rows with NA
shaughnessy_clean <- shaughnessy_data |>
           filter_all(all_vars(!is.na(.)))

# View the cleaned dataframe.
glimpse(shaughnessy_clean)

[1mRows: [22m[34m8530[39m [1mColumns: [22m[34m10[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): TYPE, HUNDRED_BLOCK, NEIGHBOURHOOD
[32mdbl[39m (7): YEAR, MONTH, DAY, HOUR, MINUTE, X, Y

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


Rows: 8,530
Columns: 10
$ type          [3m[90m<chr>[39m[23m "Break and Enter Commercial", "Break and Enter Commercia…
$ year          [3m[90m<dbl>[39m[23m 2022, 2022, 2022, 2018, 2019, 2021, 2021, 2021, 2022, 20…
$ month         [3m[90m<dbl>[39m[23m 2, 2, 4, 2, 6, 2, 12, 12, 4, 11, 4, 3, 8, 1, 2, 5, 5, 5,…
$ day           [3m[90m<dbl>[39m[23m 23, 25, 30, 2, 16, 15, 16, 26, 5, 28, 14, 27, 26, 28, 3,…
$ hour          [3m[90m<dbl>[39m[23m 23, 10, 22, 17, 9, 17, 5, 4, 22, 12, 14, 5, 16, 3, 5, 7,…
$ minute        [3m[90m<dbl>[39m[23m 0, 15, 45, 0, 3, 19, 33, 10, 18, 0, 35, 33, 30, 16, 17, …
$ hundred_block [3m[90m<chr>[39m[23m "10XX BALFOUR AVE", "10XX BALFOUR AVE", "10XX BALFOUR AV…
$ neighbourhood [3m[90m<chr>[39m[23m "Shaughnessy", "Shaughnessy", "Shaughnessy", "Shaughness…
$ x             [3m[90m<dbl>[39m[23m 490699.8, 490699.8, 490699.8, 490704.3, 490704.3, 490704…
$ y             [3m[90m<dbl>[39m[23m 5455444, 5455444, 5455444, 5455350, 5455350,

Figure 2.1.2: The initial Shaughnessy dataset with the correct column names and after basic cleaning.

To narrow down the focus of this study, 2012 to 2016 is chosen inclusive as the target years. Therefore, the two datas are filtered to contain only these five years of data in below. In addition, we want to focus primarily on the "Break and Enter Residential/Other" and "Offence against a person", so we filtered data further and selected only the year and type column. The two datas are then splitted into two sub datasets, one for each of the crime types.

In [4]:
# Filtered data to only the desired years (2012-2016) with only these two columns selected
grandview_woodland_clean <- grandview_woodland_clean |>
    filter(year == 2012 | year == 2013 | year == 2014 | year == 2015 | year == 2016) |>
    select(type, year)

# Split data into two sub datasets for the two crime types (Break and Enter Residential/Other)
grandview_woodland_break <- grandview_woodland_clean |>
    filter(type == "Break and Enter Residential/Other")

# View the filtered and selected dataframe.
glimpse(grandview_woodland_break)

# Split data into two sub datasets for the two crime types (Offence Against a Person)
grandview_woodland_offence <- grandview_woodland_clean |>
    filter(type == "Offence Against a Person")

# View the filtered and selected dataframe.
glimpse(grandview_woodland_offence)

Rows: 876
Columns: 2
$ type [3m[90m<chr>[39m[23m "Break and Enter Residential/Other", "Break and Enter Residential…
$ year [3m[90m<dbl>[39m[23m 2013, 2013, 2013, 2016, 2013, 2015, 2016, 2016, 2016, 2013, 2015,…
Rows: 1,304
Columns: 2
$ type [3m[90m<chr>[39m[23m "Offence Against a Person", "Offence Against a Person", "Offence …
$ year [3m[90m<dbl>[39m[23m 2014, 2016, 2012, 2015, 2012, 2015, 2015, 2014, 2015, 2016, 2012,…


Figure 2.1.3: The two sub-dataset of Grandview Woodland dataset with the correct column filtered and selected.

In [5]:
# Filtered data to only the desired years (2012-2016) with only these two columns selected
shaughnessy_clean <- shaughnessy_clean |>
    filter(year == 2012 | year == 2013 | year == 2014 | year == 2015 | year == 2016) |>
    select(type, year)

# Split data into two sub datasets for the two crime types (Break and Enter Residential/Other)
shaughnessy_break <- shaughnessy_clean |>
    filter(type == "Break and Enter Residential/Other")

# View the filtered and selected dataframe.
glimpse(shaughnessy_break)

# Split data into two sub datasets for the two crime types (Offence Against a Person)
shaughnessy_offense <- shaughnessy_clean |>
    filter(type == "Offence Against a Person")

# View the filtered and selected dataframe.
glimpse(shaughnessy_offense)

Rows: 574
Columns: 2
$ type [3m[90m<chr>[39m[23m "Break and Enter Residential/Other", "Break and Enter Residential…
$ year [3m[90m<dbl>[39m[23m 2016, 2016, 2013, 2015, 2015, 2013, 2012, 2013, 2014, 2012, 2015,…
Rows: 84
Columns: 2
$ type [3m[90m<chr>[39m[23m "Offence Against a Person", "Offence Against a Person", "Offence …
$ year [3m[90m<dbl>[39m[23m 2012, 2013, 2013, 2015, 2012, 2013, 2015, 2012, 2014, 2016, 2015,…


Figure 2.1.4: The two sub-dataset of Shaughnessy dataset with the correct column filtered and selected.

# DATA WRANGLING FOR OURSELF:


In [6]:
g <- grandview_woodland_clean |>
    filter(type == "Break and Enter Residential/Other") |>
    summarize(n=n()) |>
    mutate(z = n/29175) |>
    pull(z)

g

s <- shaughnessy_clean |>
    filter(type == "Break and Enter Residential/Other") |>
    summarize(n=n()) |>
    mutate(z = n/8430) |>
    pull(z)

s

#### 2.2: Data Analysis

## References

Grandview-Woodland Neighborhood Social Indicators Profile 2020. City of Vancouver. (2020). Retrieved March 19, 2023, from https://vancouver.ca/files/cov/social-indicators-profile-grandview-woodland.pdf 

Norrie, A. W., & Cambridge Core EBA eBooks Complete Collection. (2014). Crime, reason and history: A critical introduction to criminal law (Third ed.). Cambridge University Press. https://doi.org/10.1017/CBO9781139031851

Shaughnessy Neighborhood Social Indicators Profile 2020. City of Vancouver. (2020). https://vancouver.ca/files/cov/social-indicators-profile-shaughnessy.pdf

Sugiharti, L., Purwono, R., Esquivias, M. A., &amp; Rohmawati, H. (2023). The nexus between crime rates, poverty, and income inequality: A case study of indonesia. Economies, 11(2), 62. https://doi.org/10.3390/economies11020062 

The Vancouver Police Department. Crime Data [Data set]. https://geodash.vpd.ca/opendata/#

Zhang, W. (2013). The relationships between crime rate and income inequality: Evidence from China. The University of Texas at Austin. https://doi.org/https://repositories.lib.utexas.edu/handle/2152/22551 