# STA 141B Final Project Report
Fall 2018

Xavier Hung

Weiyi Chen

## Problem: Is there a correlation between speeding and red light camera violations?
### Case study: Chicago

Many people are likely to speed up as they approach an intersection when the traffic light turns yellow.  It is dangerous and can sometimes lead to a fatal crash.

This common occurrence brought us to analyze the association between red light violations and speeding. 


##  Data Extraction
We obtained our data from the City of Chicago. There are four CSV files: **red light camera locations, red light camera violations, speed camera locations, and speed camera violations** (details of these data can be found at [Here](https://data.cityofchicago.org/browse?q=red+light+camera&sortBy=relevance)).



`speed_loc` and `redlight_loc` have columns, latitude and longitude. We use Mercator projection to convert latitude and longitude into coordinates, and plot the locations of red light cameras and speed cameras.<br>The following interactive map shows the locations of all speed cameras and red light cameras in Chicago.


In [6]:
show(p)

In Chicago, 0.001 latitude is about 111.2 meters when longitude is constant. When latitude is constant, 0.001 longitude is about 82.69 meters.   <br>([This link](https://www.movable-type.co.uk/scripts/latlong.html) provides the information)

We subset the data by red light cameras and speed cameras that are near each other by 0.001 latitude and longitude. We finally obtained 18 pairs of speed cameras and red light cameras in close proximity.

In [24]:
pairs.sort_values(['redlight'])

Unnamed: 0,speed,redlight
2,3217 W 55TH ST,55TH AND KEDZIE
3,3212 W 55TH ST,55TH AND KEDZIE
5,5532 S KEDZIE AVE,55TH AND KEDZIE
7,5428 S PULASKI,55TH and PULASKI
13,5433 S PULASKI,55TH and PULASKI
6,4123 N CENTRAL AVE,CENTRAL AND IRVING PARK
1,450 N COLUMBUS DR,COLUMBUS AND ILLINOIS
15,324 E ILLINOIS ST,COLUMBUS AND ILLINOIS
14,449 N COLUMBUS DR,COLUMBUS AND ILLINOIS
17,819 E 71ST ST,COTTAGE GROVE AND 71ST


These are the addresses of the 18 pairs of speed cameras and red light cameras, and we displayed them in the following plot. There are 10 red light cameras (<font color="red">the red point</font>). At some intersections, more than one speed cameras(<font color="blue">the blue point</font>) are near the red light cameras. We can recognize the locatiosn of each speed camera clearly if we zoom in the plot. 

In [8]:
output_notebook()
show(p)

<div class="alert alert-block alert-warning">
<b>Zoom In Example:</b> This plot shows the locations of red light camera at intersection COLUMBUS AND ILLINOIS, and three speed cameras near it. </div>

In [9]:
show(p)

## Data Analysis
The two files speed camera violations and red light camera violations contain the daily volume of violations since July 2014. We **convert the daily violations to monthly violations**, and **plot speeding and red light camera violations over time at these 10 different intersections, respectively.** <br>For each intersection, we will compare the trend of speed violations with the red light camera violations, and calculate the **Spearman’s correlation coefficient** and the **p-value to test for non-correlation** since the data are nonparametric. 
Because the monotonic assumption is  not strict, the Spearman correlation will be low as the non-monotonic relationship of the data. 

We analyze the 10 intersections in the order of most red light camera violations to the least red light camera violations. <br>The red light camera violations are represented by <font color="red">red line</font> in the following plots.

###  1. Near COLUMBUS AND ILLINOIS   (intersection with the most red light camera violations)
![%E5%9B%BE%E7%89%87.png](attachment:%E5%9B%BE%E7%89%87.png)
>The Spearman correlation coefficient and p-value for speed violations at 449 N COLUMBUS DR and red light violations at COLUMBUS AND ILLINOIS are 0.16296345325527606 and 2.780441920692923e-52, respectively.
>
>The Spearman correlation coefficient and p-value for speed violations at 450 N COLUMBUS DR and red light violations at COLUMBUS AND ILLINOIS are 0.2643607622753515 and 2.7954699071506953e-134, respectively.
>
>The Spearman correlation coefficient and p-value for speed violations at 324 E ILLINOIS ST and red light violations at COLUMBUS AND ILLINOIS are -0.07541332152958609 and 0.0011137050274879702, respectively.

Between the speed violations at 449 N COLUMBUS DR and red light violations at COLUMBUS AND ILLINOIS, there is a very weak,  positive correlation (ρ = 0.16296, p < 0.001). The speeding violations are increasing then decreasing within a narrow range(<font color="blue">blue dotted line</font>). The trend is roughly similar to the red light violations.

Between the speed violations at 450 N COLUMBUS DR and red light violations at COLUMBUS AND ILLINOIS, there is a weak, positive correlation(ρ= 0.26436, p < 0.001). Since Spearman correlation measures monotonic relationship, the low coefficient does not mean weak correlation. From the graph, the trend of red light violations at COLUMBUS AND ILLINOIS is nearly same to the trend of speed violations at 450 N COLUMBUS DR (<font color="purple">purple dotted line</font>). They increase and decrease almost at the same time, thus we believe, there is a somehow strong correlation between them. 

However, there is no correlation between speed violations at 324 E ILLINOIS ST and red light violations at COLUMBUS AND ILLINOIS (ρ= -0.07541, p = 0.0011). We can check the <font color="green">green dotted line</font>, the speeding at 324 E ILLINOIS ST is very few, and we guess this is a newly installed speed camera at test. 

We interestingly find that summer is the peak time of violations, maybe people are more impatient in summer. Fortunately, the overall violations are decreasing over time.

###  2. Near PULASKI AND ARCHER  (intersection with the second most red light camera violations)
![%E5%9B%BE%E7%89%87.png](attachment:%E5%9B%BE%E7%89%87.png)
>The Spearman correlation coefficient and p-value for speed violations at 5030 S PULASKI and red light violations at PULASKI AND ARCHER are 0.0858098835074142 and 8.32305340821259e-06, respectively.
>
>The Spearman correlation coefficient and p-value for speed violations at 4929 S PULASKI and red light violations at PULASKI AND ARCHER are 0.08055678492137777 and 3.1888954495604886e-05, respectively.

Between the speed violations at 5030 S PULASKI and red light violations at PULASKI AND ARCHER, there is a very weak, positive correlation (ρ = 0.08581, p < 0.001).Between the speed violations at 4929 S PULASKI and red light violations at PULASKI AND ARCHER, there is also a very weak, positive correlation(ρ= 0.08056, p < 0.001). Although the coefficients are small, from the graph, the three lines change in the same pattern, there are moderate correlations between these two pairs.

There is no big change in red light camera violations and speed violations at 4929 S PULASKI  (<font color="green">green dotted line</font>). 
Again, summer is the peak time of violations.For speed violations at 5030 S PULASKI (<font color="blue">blue dotted line</font>), there are nearly 3000 speed violations per month in summer 2014, about 1800 violations per month in summer 2015, and around 1000 violations per month in summer 2016, 2017, 2018. Since 2016, the majority of monthly violations are kept below 1000.

###  3. Near PULASKI AND 79TH   (intersection with the third most red light camera violations)
![%E5%9B%BE%E7%89%87.png](attachment:%E5%9B%BE%E7%89%87.png)
>The Spearman correlation coefficient and p-value for speed violations at 7833 S PULASKI and red light violations at PULASKI AND 79TH are 0.04217155944082334 and 0.030951470650184894, respectively.
>
>The Spearman correlation coefficient and p-value for speed violations at 7826 S PULASKI and red light violations at PULASKI AND 79TH are 0.014135354613446081 and 0.4688652652608186, respectively.

There is no correlation between the speed violations at 7833 S PULASKI and red light violations at PULASKI AND 79TH (ρ = 0.04217, p = 0.03).
There is also no correlation between the speed violations at 7826  S PULASKI and red light violations at PULASKI AND 79TH (ρ = 0.01414, p = 0.47).


the red light camera violations periodically fluctuate, highest at summer and lowest at winter. The speed violations at 7826 S PULASKI are few and flat(<font color="green">green dotted line</font>). The speed violations at 7833 S PULASKI dramatically drop from July 2014, and the overall speed violations reduce slightly over time.

###  4. Near 55TH AND KEDZIE
![%E5%9B%BE%E7%89%87.png](attachment:%E5%9B%BE%E7%89%87.png)
>The Spearman correlation coefficient and p-value for speed violations at 5532 S KEDZIE AVE and red light violations at 55TH AND KEDZIE are 0.019641669561346203 and 0.3038798620398834, respectively.
>
>The Spearman correlation coefficient and p-value for speed violations at 3212 W 55TH ST and red light violations at 55TH AND KEDZIE are -0.042898154058709455 and 0.06566213991596444, respectively.
>
>The Spearman correlation coefficient and p-value for speed violations at 3217 W 55TH ST and red light violations at 55TH AND KEDZIE are 0.019253443838843958 and 0.4503866399393628, respectively.

The p-value is 0.304, 0.065,0.450, respectively, there is no correlation among these three pairs.

We see a trend of increasing in red light camera violations.
Speed violations at 3212 W 55TH ST (<font color="green">green dotted line</font>) and at 3217 W 55TH ST <font color="blue">blue dotted line</font> are similar in pattern, both low and flat. There are a great number of speed violations at 5532 S KEDZIE AVE (<font color="purple">purple dotted line</font>)at 2015(the first year of use), the peak is almost 600. Then, the number of speeding decline significantly(below 100), the possible reason is people noticed the installation of the speed camera, and stay alert when they pass through the avenue. 

###  5. Near COTTAGE GROVE AND 71ST
![%E5%9B%BE%E7%89%87.png](attachment:%E5%9B%BE%E7%89%87.png)
>The Spearman correlation coefficient and p-value for speed violations at 819 E 71ST ST and red light violations at COTTAGE GROVE AND 71ST are -0.2981443785319082 and 0.012181737341665288, respectively.
>
>The Spearman correlation coefficient and p-value for speed violations at 7122 S SOUTH CHICAGO AVE and red light violations at COTTAGE GROVE AND 71ST are -0.06509088557747152 and 0.5979424534067732, respectively.

Since the two speed cameras near COTTAGE GROVE AND 71ST, were installed recently, there is not enough information. The number of red-light running is fluctuating within a small range and slightly increase over time.

###  6. Near 55TH and PULASKI
![%E5%9B%BE%E7%89%87.png](attachment:%E5%9B%BE%E7%89%87.png)
>The Spearman correlation coefficient and p-value for speed violations at 5433 S PULASKI and red light violations at 55TH and PULASKI are 0.051366666490227586 and 0.013029076620337369, respectively.
>
>The Spearman correlation coefficient and p-value for speed violations at 5428 S PULASKI and red light violations at 55TH and PULASKI are 0.12640609007441064 and 1.19662717128761e-09, respectively.

Between the speed violations at 5433 S PULASKI and red light violations at 55TH and PULASKI, there is no correlation (p < 0.013).Between the speed violations at 5428 S PULASKI and red light violations at 55TH and PULASKI , there is a very weak, positive correlation(ρ= 0.12641, p < 0.001). Although the coefficients are small, from the graph, the three lines change in same pattern, there are moderate correlations between these two pairs.

The overall violations are decreasing over time.


###  7. Near WESTERN AND ADDISON
![%E5%9B%BE%E7%89%87.png](attachment:%E5%9B%BE%E7%89%87.png)
>The Spearman correlation coefficient and p-value for speed violations at 3534 N WESTERN and red light violations at WESTERN AND ADDISON are 0.08296830544042871 and 0.01019696491485027, respectively.

There is no relationship between speed violations at 3534 N WESTERN and red light violations at WESTERN AND ADDISON (p = 0.01).

The speed violations have no data from April 2016 to August 2017, so the plot looks a bit weird. both violations are relatively high in summer. 

###  8. Near WESTERN AND CERMAK
![%E5%9B%BE%E7%89%87.png](attachment:%E5%9B%BE%E7%89%87.png)
>The Spearman correlation coefficient and p-value for speed violations at 2335 W CERMAK RD and red light violations at WESTERN AND CERMAK are 0.012647836650445315 and 0.6934076333012771, respectively

There is no relationship between speed violations at 2335 W CERMAK RD and red light violations at WESTERN AND CERMAK (p = 0.69).

Red light cameras violations are relatively low in 2015, then it increases in summer and decreases in winter. However, speed violations are largest in 2015, then it declines and almost below 50 per month in the following years.


###  9. Near  PULASKI AND CHICAGO
![%E5%9B%BE%E7%89%87.png](attachment:%E5%9B%BE%E7%89%87.png)
>The Spearman correlation coefficient and p-value for speed violations at 732 N PULASKI RD and red light violations at PULASKI AND CHICAGO are -0.03054809323934059 and 0.2745974660494698, respectively.

There is no relationship between speed violations at 732 N PULASKI RD and red light violations at PULASKI AND CHICAGO (p = 0.27). 

Red light camera violations are rising and the speed camera violations are declining. 

###  10. Near  CENTRAL AND IRVING PARK (intersection with the least red light camera violations)
![%E5%9B%BE%E7%89%87.png](attachment:%E5%9B%BE%E7%89%87.png)
>The Spearman correlation coefficient and p-value for speed violations at 4123 N CENTRAL AVE and red light violations at CENTRAL AND IRVING PARK are -0.026114602543561617 and 0.1987851408479989, respectively.

There is no relationship between speed violations at 4123 N CENTRAL AVE and red light violations at CENTRAL AND IRVING PARK (ρ= -0.02611, p = 0.19879). The number of red-light running is mostly within 100 to 200 monthly, and no big change.  However, the speed violations vary largely, and interestingly, the peak season is winter, which is on the contrary of our previous finding that summer has the most violations. Moreover, there is a sharp drop to 0 and rise of speed violations from July 2015 to Dec. 2015. It is definitely impossible that there is 0 speeding at that time, the possible reason is the speed camera is offline for about 2 months.

## Conclusion