# No crime related coverages analysis

Author: Mo Al Elew

**What notebook does/produces:**

Rerun our quintile racial distribution analysis factoring out insurance coverage types that could pay out following a car theft or break-in to investigate whether the racial gaps could be related to differences in rates of vehicular crimes

**Approach:**

The general pattern includes:
1. Identify coverages that could pay out following a car theft or break-in
2. Filter out those coverages and calculate a generic base rate premium excluding crime-related coverages
3. Recalculate the location effect using the noncrime generic base rate premium
4. Sort the non-crime premium data into quintiles
5. Rerun the racial population distribution analysis with noncrime premium data
6. Compare to the crime coverage inclusive population distribution to the noncrime population distribution

**Conclusion**

Little to no change in population distribution

In [1]:
import geopandas as gpd
import pandas as pd

# Constants and helper functions

In [2]:
DATA_FP = "./outputs/citizens_auto_gis.geojson"

In [3]:
CRIME_RELATED_COVERAGES = [
    "rate_comp",
    "rate_excess_electronic_equipment",
    "rate_roadside_assistance",
]

RATE_Q_LABELS = [
    "lowest effect",
    "middle low",
    "median",
    "middle high",
    "highest effect",
]

QUANTILE_GROUP_BY_COLS = ["black_tot", "white_tot", "tot_pop"]

In [4]:
ROUNDING_PRECISION = 2

In [5]:
def get_rate_columns(df_rate_table):
    RATE_PREFIX = "rate_"
    return [col for col in df_rate_table.columns if col.startswith(RATE_PREFIX)]


def prptn_to_pct(val, precision=ROUNDING_PRECISION):
    return round(val, precision) * 100

# Read data

In [6]:
GDF = gpd.read_file(DATA_FP)
GDF = GDF.dropna(how="all", axis=1)

## Preprocess

Retain lowest rate for each geography

In [7]:
gdf_min_rate = GDF.sort_values(by="generic_location_based_premium").drop_duplicates(
    subset=["geo_id"], keep="first", ignore_index=True
)

In [8]:
gdf = gdf_min_rate.copy()

# Process

## Factor out crime related coverages

In [9]:
non_crime_coverages = [
    col for col in get_rate_columns(GDF) if col not in CRIME_RELATED_COVERAGES
]

## Recalculate location effect

In [10]:
gdf["non_crime_generic_premium"] = round(gdf[non_crime_coverages].sum(axis=1))

gdf["non_crime_location_effect"] = round(
    gdf["non_crime_generic_premium"] / gdf["non_crime_generic_premium"].median(),
    ROUNDING_PRECISION,
)

# Analysis

## Rate quantiles

In [11]:
gdf["non_crime_effect_quantile"] = pd.qcut(
    gdf["non_crime_generic_premium"], q=len(RATE_Q_LABELS), labels=RATE_Q_LABELS
)

gdf["effect_quantile"] = pd.qcut(
    gdf["generic_location_based_premium"], q=len(RATE_Q_LABELS), labels=RATE_Q_LABELS
)

In [12]:
gdf_groupby_quantiles = gdf.groupby("effect_quantile", observed=False)[
    QUANTILE_GROUP_BY_COLS
].sum()
df_distribution = prptn_to_pct(
    gdf_groupby_quantiles.div(gdf_groupby_quantiles.sum(axis=0), axis=1), 3
)
df_distribution

Unnamed: 0_level_0,black_tot,white_tot,tot_pop
effect_quantile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lowest effect,5.3,24.8,21.6
middle low,8.0,23.9,21.3
median,9.7,23.4,21.2
middle high,20.0,21.0,21.2
highest effect,57.0,6.9,14.8


In [13]:
gdf_groupby_quantiles = gdf.groupby("non_crime_effect_quantile", observed=False)[
    QUANTILE_GROUP_BY_COLS
].sum()
column_sums = gdf_groupby_quantiles.sum(axis=0)
df_non_crime_distribution = prptn_to_pct(
    gdf_groupby_quantiles.div(column_sums, axis=1), 3
)
df_non_crime_distribution

Unnamed: 0_level_0,black_tot,white_tot,tot_pop
non_crime_effect_quantile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lowest effect,4.1,24.7,21.1
middle low,6.9,24.2,21.2
median,10.8,23.6,21.7
middle high,21.7,19.8,20.7
highest effect,56.6,7.6,15.3


In [14]:
df_diff = df_non_crime_distribution - df_distribution
df_diff

Unnamed: 0_level_0,black_tot,white_tot,tot_pop
non_crime_effect_quantile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lowest effect,-1.2,-0.1,-0.5
middle low,-1.1,0.3,-0.1
median,1.1,0.2,0.5
middle high,1.7,-1.2,-0.5
highest effect,-0.4,0.7,0.5


## Detroit

In [15]:
gdf[gdf["is_in_detroit"]]["non_crime_effect_quantile"].value_counts()

non_crime_effect_quantile
highest effect    874
middle high         5
lowest effect       0
middle low          0
median              0
Name: count, dtype: int64

In [16]:
gdf[gdf["is_in_detroit"]]["effect_quantile"].value_counts()

effect_quantile
highest effect    874
middle high         5
lowest effect       0
middle low          0
median              0
Name: count, dtype: int64