# DATA271 Final Project - California Call for Service Analysis

---

## Research Details

---

### Introducing the Problem
I'll be performing a statistical investigative process to explore and analyze police call-for-service data in conjunction with meteorologic and demographic datasets to identify trends that can inform alternative policing models and highlight potential overallocations of resources in certain district areas. Paired with general census-collected information, weather reports, and similar contextual data, this project will aim to identify patterns and use them to determine common stressors and causes behind the influx of emergency service requests.

---

### Addressing the Problem
My approach will involve joining datasets on their dates to analyze time series across years, looking for patterns and distributions of calls and weather readings, and finding correlations between attributes that can help us anticipate problems using data-driven insights. This aims to avoid overwhelming the public with a presence that can be counterproductive—especially in the era of police brutality and avoidable encounters involving individuals reporting non-violent crimes who may not receive proper care or treatment.

The analysis should support municipal governments and police agencies by helping reduce resource and labor costs through more effective resource allocation. The goal is to provide actionable statistics that motivate agencies to reconsider how officers are distributed and how certain calls are addressed. We can also support this by identifying correlations between call types and environmental or demographic factors to help build profiles of neighborhoods or districts that vary across cities, counties, and states. Below are some articles that elaborate further and show how others have tackled similar problems:

- [Medium article about alternative policing models for non-violent calls](https://londonbreed.medium.com/alternatives-to-police-for-responding-to-non-violent-911-calls-44c7d40ad9b1)
- [CNA article on alternative 911 dispatch models](https://www.cna.org/quick-looks/2022/alternative-911-dispatch-models)
- [Police Chief Magazine article about data-driven policing methods](https://www.policechiefmagazine.org/turning-point-policing-methods/)

---

### Analysis Breakdown
We ask the following questions before conducting our official exploratory data analysis:

- How can we use a data-driven strategy to reduce the physical presence of officers, lower labor and resource costs, and shorten dispatch times when dealing with nonviolent calls?
- How can we combine insights from these calls—including their characteristics and any reported alleged crimes—with general census and weather data to better allocate resources and determine the necessary equipment for specific districts or situations, thereby minimizing unnecessary costs and overly aggressive responses?
- Can we identify a clear correlation between the environmental factors of cities and counties and the nature or frequency of calls for service, enabling us to predict these calls with reasonable accuracy?

Our analysis will be broken down into the following stages, with specific techniques designed to help tell a meaningful and data-driven story:

1. **Explore Individual Datasets**  
   Perform exhaustive exploratory data analysis (EDA) on each dataset individually—transforming variables, cleaning data, checking for missing values and outliers, and applying descriptive statistics to uncover underlying patterns. This includes identifying correlations between variables, exploring distribution shapes (e.g., bimodality), and applying statistical tests such as ANOVA, chi-squared tests, and bootstrapping where appropriate.

2. **Analyze Combined Datasets**  
   Join datasets on date fields and other shared keys to analyze time-based and cross-variable relationships. Use visual encodings (e.g., color hues) to compare grouped variables—such as call types across weather conditions or demographic segments. Generate geographic heat maps to visualize spatial distributions of call frequency across U.S. districts or counties, and begin to connect attributes across contexts (e.g., weather with crime type, or demographics with call volume).

3. **Evaluate Significance and Observational Limitations**  
   Assess whether observed trends and correlations meaningfully address our original research questions. This includes evaluating both statistical and practical significance, while recognizing that results are drawn from observational data and cannot confirm causation. Even if certain relationships appear weak or inconclusive, this stage helps clarify the limits of current data and guides interpretation responsibly.

4. **Answer Research Questions**  
   Revisit the original questions in light of the findings. Identify which questions can be confidently addressed with the available data and which remain unresolved due to data limitations or ambiguous patterns. This stage will help form the basis for final insights and conclusions.

5. **Recommendations & Further Exploration**
   Based on findings, propose actionable recommendations for improving resource allocation or dispatch policies. Highlight areas where deeper or more granular data could yield stronger insights and suggest directions for future research or experimentation.

---

### Datasets
1. [PD's Call for Service Data](https://humboldtgov.org/2161/Daily-CFS-Report)  
2. [CALMAC Weather Files](https://www.calmac.org/weather.asp)  
3. [Census Bureau Data](https://data.census.gov)  
4. [State of CA DOJ Crime Data](https://oag.ca.gov/crime)

---

### Libraries & Modules
- **Pandas:** Efficiently works with tabular data retrieved via API or download  
- **Numpy:** Performs element-wise arithmetic and advanced mathematical operations  
- **Matplotlib:** Core visualization library, used to produce basic plots and supports Seaborn  
- **Seaborn:** Built on Matplotlib, provides additional high-level statistical plots  
- **Sodapy:** Accesses Socrata Open Data API, useful for open government datasets  
- **Plotnine:** Uses a grammar-of-graphics approach for elegant and expressive data visualizations  

---

### Project Resources
- [GitHub Repository](https://github.com/toritotony/Data271FinalProject)

---


## Collecting Data
Here we collect the data sets mentioned above, across 2015-2025 so we can avoid outlier information that might come from COVID outbreak in 2020. 

I'll precede each retrieval with the corresponding dataset, the source of retrieval, and further details regarding it's purpose and when it was collected.

After collection we'll list the variables available to us, what tehy mean, and any immediate findings that we can identify after looking at its description and information in pandas.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sodapy import Socrata
import random
import numpy as np
from plotnine import *


## Clean Data

## Gather Statistics

## Analyze Statistics 

## Use Above to Answer Questions using Inferential Statistics and Prediction

## Answer Questions and Conclude Findings

## References and Citations