# Lunar Phase and Crime Incidents in Downtown Chicago

DBMS260_Final Project
Tracie Lindquist

There has long been speculation that there is a correlation between the number of crimes committed on a particular day and meteorological phenomena, such as precipitation, temperature, and moon phases. There have been mixed results of studies conducted to attempt to find such a correlation. One such study, completed in 1984 did find that the number of crimes committed on days in which the moon was full was significantly higer than the crime rate on days when the moon was in any other phase. They attributed this rise to a phenomenon they dubbed "human tidal waves" and attributed to the effect of lunar gravity on the water content of the human body (Sharma, 1984).

Further analysis completed in later studies, however, has been unable to corrobate the findings of this study. One such study reviewed police, astronomical, and weather data, for an unnamed city in the American southwest. The researchers concluded that

> "With few exceptions, the moon's phase was not related with the level of crime and disorder reported to the police." (Schafer, 2010)

In this project, I plan to examine crime and climatological data from the city of Chicago, IL to test whether the following hypotheses are true:

* Crime decreases on colder days
* Crime increases on warmer days
* Crime decreases on days with precipitation, regardless of precipitation type
* Crime increases on days when the moon is full


## Dataset Analysis

Three datasets will be used to complete this analysis. 

Several of these datasets are quite large, and one requires the addition of significant data in order to link it to the other two sets. I expect some challenges in normalizing and extracting data that will allow me to test my hypotheses. 

Descriptions of these datasets follows.  


**Chicago Downtown Wards Crime Data**

The Chicago Downtown Wards Crime Data dataset includes a listing all crimes for the last 12 calendar months. This dataset has been filtered to include only police wards 3,4, 42, and 43 as these are the wards relevant to downtown Chicago. The dataset has been further filtered to contain only the following attributes: date, time of occurrence, block number street, primary description, secondary description, latitude, and longitude. 

In [14]:
import pandas as pd
crime_data = pd.read_csv("ChicagoCrime_102023to102024_DowntownWards.csv")
crime_data

Unnamed: 0,Date,Time of Occurrence,Block Number,Street,Primary Description,Secondary Description,Ward,Latitude,Longitude
0,10/27/2023,7:00,500,S State St,theft,from building,4,41.875200,-87.627600
1,10/27/2023,7:20,1000,N Lake Shore Dr,battery,simple,42,41.900900,-87.624200
2,10/27/2023,7:45,300,W Illinois St,theft,from building,42,41.890800,-87.636100
3,10/27/2023,8:00,2700,N Lake Shore Dr,motor vehicle theft,attempt - automobile,43,41.932400,-87.636400
4,10/27/2023,8:39,3400,S State St,battery,simple,3,41.832200,-87.626700
...,...,...,...,...,...,...,...,...,...
26281,10/24/2024,22:38,700,E Solidarity Dr,other offense,other vehicle offense,4,41.866441,-87.611786
26282,10/24/2024,23:55,0,W Hubbard St,theft,from building,42,41.890052,-87.628914
26283,10/25/2024,0:00,1500,S Wabash Ave,other offense,telephone threat,3,41.861335,-87.625690
26284,10/25/2024,0:00,500,W Fullerton Pkwy,theft,over $500,43,41.925563,-87.641815


Checking the data integrity reveals that the dataset contains 26286 rows.  10 rows are missing latitude and longitude data, which will only matter if I choose to include geological mapping of the dataset, which is not included in the original proposal. 

In [9]:
crime_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26286 entries, 0 to 26285
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Date                   26286 non-null  object 
 1   Time of Occurrence     26286 non-null  object 
 2   Block Number           26286 non-null  int64  
 3   Street                 26286 non-null  object 
 4   Primary Description    26286 non-null  object 
 5   Secondary Description  26286 non-null  object 
 6   Ward                   26286 non-null  int64  
 7   Latitude               26276 non-null  float64
 8   Longitude              26276 non-null  float64
dtypes: float64(2), int64(2), object(5)
memory usage: 1.8+ MB


**Moon Phase Data**

This dataset is from the US Navy Astronomical Applications Department [Link](https://aa.usno.navy.mil/calculated/moon/phases?date=2024-11-01&nump=50&format=p&submit=Get+Data). It includes dates/times of all full moons since 1981. This dataset will be filtered to include only dates between October of 2024 and October of 2025, with nominal data for the phase of the moon as follows: 1 = New Moon, 2 = First quarter, 3 = Full, 4 = Third Quarter.

In [18]:
moon_phases=pd.read_csv("MoonPhases_102023to102025.csv")
moon_phases

Unnamed: 0,Date,Moon Phase
0,10/28/2023,3
1,10/29/2023,3
2,10/30/2023,3
3,10/31/2023,3
4,11/1/2023,3
...,...,...
722,10/19/2025,4
723,10/20/2025,4
724,10/21/2025,1
725,10/22/2025,1


Checking the data integrity reveals that the dataset contains 727 rows with no missing data.

In [23]:
moon_phases.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 727 entries, 0 to 726
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Date        727 non-null    object
 1   Moon Phase  727 non-null    int64 
dtypes: int64(1), object(1)
memory usage: 11.5+ KB


**Climatological Data**

This dataset is from NOAA’s National Centers for Environmental information that covers all weather stations in and around Chicago. This dataset will be transformed to include only the data reported from the O'Hare International Airport station.  The dataset has been transformed to include date, precipitation, snow, snow depth, average temperature, max temperature, and min temperature for each date. 

In [31]:
weather = pd.read_csv("ChicagoWeather_OhareIntlAirport.csv")
weather

Unnamed: 0,Date,Precipitation,Snow,Snow Depth,Average Temp,Max Temp,Min Temp
0,10/27/2023,0.07,0.0,0.0,67,72,45
1,10/28/2023,0.00,0.0,0.0,47,52,39
2,10/29/2023,0.01,0.0,0.0,46,47,39
3,10/30/2023,0.00,0.0,0.0,39,44,31
4,10/31/2023,0.03,0.9,0.0,34,38,30
...,...,...,...,...,...,...,...
360,10/21/2024,0.00,0.0,0.0,67,81,54
361,10/22/2024,0.01,0.0,0.0,65,73,55
362,10/23/2024,0.00,0.0,0.0,62,65,44
363,10/24/2024,0.21,0.0,0.0,52,64,40


Checking the integrity reveals that there are 365 records in the dataset with no nulls. 

In [34]:
weather.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Date           365 non-null    object 
 1   Precipitation  365 non-null    float64
 2   Snow           365 non-null    float64
 3   Snow Depth     365 non-null    float64
 4   Average Temp   365 non-null    int64  
 5   Max Temp       365 non-null    int64  
 6   Min Temp       365 non-null    int64  
dtypes: float64(3), int64(3), object(1)
memory usage: 20.1+ KB


References

Schafer, J.A., Varano, S.P., Jarvis, J.P., Cancino, J.M. (2010). Bad Moon on the Rise? Lunay Cycles and Incidents of Crime. *Journal of Criminal Justice. Vol.38,* pp.359-367. 

Thakur, C., Sharma, D. (1984, December 22). Full Moon and Crime. *British Medical Journal. Vol. 298,* pp. 1789-1791