# General Framework

## What's the story?
* I was in high school attending public school in downtown Chicago. I remember sometimes being afraid of civil unrest, especially surrounding high-profile court cases and during COVID-19. I remember experiencing teacher strikes, crazy winter storms, and other volatile activity, thinking: is it still safe to take the L blue line train to school today?

## What's the goal?
* To help consumers develop a quantitative understanding of exactly how likely civil unrest is to occur in their target city.
* The idea is to cut through the uncertainty and come up with a reliable predictor.

# What data will I use?

I will use the GDELT database to identify violent events. Specifically, I will pull events with the code **145X** from the GDELT Event Database.

---

# Data Collection

Since we want to avoid downloading the entire GDELT dataset, we'll use the **GDELT Event API** to fetch data related to civil unrest in Chicago.

**Note**: The GDELT Event API doesn't directly allow filtering by EventCode. However, we can use keywords associated with civil unrest and filter by location.

In [49]:
%pip install db-dtypes


Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [50]:
from google.cloud import bigquery
import pandas as pd

# Initialize BigQuery client with the project ID
client = bigquery.Client(project="civil-unrest-predictor")

# SQL query
query = """
SELECT
    SQLDATE,
    EventCode,
    ActionGeo_FullName,
    ActionGeo_Lat,
    ActionGeo_Long,
    AvgTone
FROM
    `gdelt-bq.full.events`
WHERE
    EventCode LIKE '145%'
    AND ActionGeo_FullName LIKE '%Chicago%'
ORDER BY
    SQLDATE DESC
LIMIT
    100;
"""

# Execute the query
query_job = client.query(query)

# Convert results to a DataFrame
results = query_job.result().to_dataframe()

# Save the results to a CSV file
results.to_csv("chicago_violent_protests_subset.csv", index=False)
print("Results saved to 'chicago_violent_protests_subset.csv'")
print(results.head())


ValueError: Please install the 'db-dtypes' package to use this function.