# AI for Redistricting Final Project

## Oregon Data Cleaning

@authors: vcle, bpuhani

All data retrieved in March 2025: <br>
[2020 Population data](https://redistrictingdatahub.org/dataset/oregon-block-pl-94171-2020-by-table/): based on the decennial census at the Census Block level on 2020 Census Redistricting Data

[2020 County data](https://redistrictingdatahub.org/dataset/oregon-county-pl-94171-2020/): from 2020 Census Redistricting Data (P.L. 94-171) Shapefiles

[2020 election data](https://redistrictingdatahub.org/dataset/vest-2020-oregon-precinct-and-election-results/):  VEST 2020 Oregon precinct and election results

[2021 State Senate District plan](https://redistrictingdatahub.org/dataset/2021-oregon-state-senate-adopted-plan/): 2021 Oregon State Senate Approved Plan


<!-- Commented out 2018 Election data for now

[2018 election data](https://redistrictingdatahub.org/dataset/vest-2018-oregon-precinct-and-election-results/)**:**  VEST 2018 Oregon precinct and election results

-->

In [None]:
# imports
import geopandas as gpd
import maup
from maup import smart_repair
import time
import warnings

from ai_for_redistricting_final_project.utilities import load_shapefile

In [None]:
maup.progress.enabled = True

warnings.filterwarnings('ignore')

start_time = time.time()

## Import and Explore the Data

#### Description of the data or_pl2020_b:
* P1. Race
* P2. Hispanic or Latino, and Not Hispanic or Latino by Race
* P3. Race for the Population 18 Years and Over
* P4. Hispanic or Latino, and Not Hispanic or Latino by Race for the Population 18 Years and Over
* P5. Group Quarters Population by Major Group Quarters Type
* H1. Occupancy Status

Taken from the [Documentation](https://www2.census.gov/programs-surveys/decennial/2020/technical-documentation/complete-tech-docs/summary-file/2020Census_PL94_171Redistricting_StatesTechDoc_English.pdf)

We are using the following data:
* P2
* P4

In [None]:
# Paths to the data
population_path = "./or_data/or_pl2020_b/or_pl2020_p2_b.shp"
vap_path = "./or_data/or_pl2020_b/or_pl2020_p4_b.shp"
vest20_path = "./or_data/or_vest_20/or_vest_20.shp"
# vest18_path = "./or_data/or_vest_18/or_vest_18.shp" # currently not used
county_path = "./or_data/or_pl2020_cnty/or_pl2020_cnty.shp"
sen_path = "./or_data/or_sldu_2021/Senate_LC_Draft_2_-_Revised_.shp"

### Loading data
Loading the total population data

In [None]:
population_df = load_shapefile(population_path)

Loading the voting age population data

In [None]:
vap_df = load_shapefile(vap_path)

Loading the VEST 2020 election data

In [None]:
vest20_df = load_shapefile(vest20_path)

Loading the County data

In [None]:
county_df = gpd.read_file(county_path)

Loading the approved 2021 State Senate District plan

In [None]:
sen_df = gpd.read_file(sen_path)

In [None]:
nr_of_districts = sen_df.shape[0]
print(f"Number of State Senate Seats in Oregon: {nr_of_districts}")

## Exploring the data
Column names of the data

In [None]:
print(population_df.columns)
print(vap_df.columns)
print(vest20_df.columns)
print(county_df.columns)
print(sen_df.columns)