# No crime related coverages analysis

Author: Mo Al Elew

**What notebook does/produces:**

Rerun our quintile racial distribution analysis factoring out insurance coverage types that could pay out following a car theft or break-in to investigate whether the racial gaps could be related to differences in rates of vehicular crimes

**Approach:**

The general pattern includes:
1. Identify coverages that could pay out following a car theft or break-in
2. Filter out those coverages and calculate a generic base rate premium excluding crime-related coverages
3. Recalculate the location effect using the noncrime generic base rate premium
4. Sort the non-crime premium data into quintiles
5. Rerun the racial population distribution analysis with noncrime premium data
6. Compare to the crime coverage inclusive population distribution to the noncrime population distribution

**Conclusion**

Little to no change in population distribution

In [1]:
import geopandas as gpd
import pandas as pd

# Constants and helper functions

In [2]:
DATA_FP = "./outputs/allstate_auto_gis.geojson"

In [3]:
CRIME_RELATED_COVERAGES = [
    "rate_comprehensive",
]

RATE_Q_LABELS = [
    "lowest effect",
    "middle low",
    "median",
    "middle high",
    "highest effect",
]

GEOID_GROUP_BY_COLS = [
    "generic_location_based_premium",
    "non_crime_generic_premium",
    "white_tot",
    "black_tot",
    "tot_pop",
    "density",
    "median_income",
]

QUANTILE_GROUP_BY_COLS = ["black_tot", "white_tot", "tot_pop"]

In [4]:
ROUNDING_PRECISION = 2

In [5]:
def get_rate_columns(df_rate_table):
    RATE_PREFIX = "rate_"
    return [col for col in df_rate_table.columns if col.startswith(RATE_PREFIX)]


def prptn_to_pct(val, precision=ROUNDING_PRECISION):
    return round(val, precision) * 100

# Read data

In [6]:
GDF = gpd.read_file(DATA_FP)
GDF["tot_pop"] = GDF["total_pop"]

# Process

## Factor out crime related coverages

In [7]:
non_crime_coverages = [
    col for col in get_rate_columns(GDF) if col not in CRIME_RELATED_COVERAGES
]
gdf = GDF.copy()

## Recalculate location effect

In [8]:
gdf["non_crime_generic_premium"] = round(gdf[non_crime_coverages].sum(axis=1))

gdf["non_crime_location_effect"] = round(
    gdf["non_crime_generic_premium"] / gdf["non_crime_generic_premium"].median(),
    ROUNDING_PRECISION,
)

# Analysis

## Rate quantiles

I average the generic rate for each ZCTA `geo_id`. I average the rates by geographic id to avoid double counting geographies.

In [9]:
gdf_groupby_geo_id = gdf.groupby("geo_id")[GEOID_GROUP_BY_COLS].mean()

gdf_groupby_geo_id["effect_quantile"] = pd.qcut(
    gdf_groupby_geo_id["generic_location_based_premium"],
    q=len(RATE_Q_LABELS),
    labels=RATE_Q_LABELS,
)
gdf_groupby_geo_id["non_crime_effect_quantile"] = pd.qcut(
    gdf_groupby_geo_id["non_crime_generic_premium"],
    q=len(RATE_Q_LABELS),
    labels=RATE_Q_LABELS,
)

In [10]:
gdf_groupby_quantiles = gdf_groupby_geo_id.groupby("effect_quantile", observed=False)[
    QUANTILE_GROUP_BY_COLS
].sum()


print("This calculates (group subset in quantile / total group population)")
df_distribution = prptn_to_pct(
    gdf_groupby_quantiles.div(gdf_groupby_quantiles.sum(axis=0), axis=1), 3
)
df_distribution

This calculates (group subset in quantile / total group population)


Unnamed: 0_level_0,black_tot,white_tot,tot_pop
effect_quantile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lowest effect,8.0,20.8,19.0
middle low,4.6,14.4,12.8
median,8.6,16.3,15.3
middle high,9.9,20.1,18.4
highest effect,68.9,28.3,34.4


In [11]:
gdf_groupby_quantiles = gdf_groupby_geo_id.groupby(
    "non_crime_effect_quantile", observed=False
)[QUANTILE_GROUP_BY_COLS].sum()


print("This calculates (group subset in quantile / total group population)")
df_non_crime_distribution = prptn_to_pct(
    gdf_groupby_quantiles.div(gdf_groupby_quantiles.sum(axis=0), axis=1), 3
)
df_non_crime_distribution

This calculates (group subset in quantile / total group population)


Unnamed: 0_level_0,black_tot,white_tot,tot_pop
non_crime_effect_quantile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lowest effect,8.0,20.8,19.0
middle low,4.6,14.4,12.9
median,8.5,16.0,15.1
middle high,9.7,20.0,18.3
highest effect,69.2,28.8,34.9


In [12]:
df_diff = df_non_crime_distribution - df_distribution
df_diff

Unnamed: 0_level_0,black_tot,white_tot,tot_pop
non_crime_effect_quantile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lowest effect,0.0,0.0,0.0
middle low,0.0,0.0,0.1
median,-0.1,-0.3,-0.2
middle high,-0.2,-0.1,-0.1
highest effect,0.3,0.5,0.5


## Detroit

In [13]:
gdf["effect_quantile"] = pd.qcut(
    gdf["generic_location_based_premium"],
    q=len(RATE_Q_LABELS),
    labels=RATE_Q_LABELS,
)
gdf["non_crime_effect_quantile"] = pd.qcut(
    gdf["non_crime_generic_premium"],
    q=len(RATE_Q_LABELS),
    labels=RATE_Q_LABELS,
)

In [14]:
gdf[gdf["is_in_detroit"]]["non_crime_effect_quantile"].value_counts()

non_crime_effect_quantile
highest effect    189
lowest effect       0
middle low          0
median              0
middle high         0
Name: count, dtype: int64

In [15]:
gdf[gdf["is_in_detroit"]]["effect_quantile"].value_counts()

effect_quantile
highest effect    189
lowest effect       0
middle low          0
median              0
middle high         0
Name: count, dtype: int64