# No crime related coverages analysis

Author: Mo Al Elew

**What notebook does/produces:**

Rerun our quintile racial distribution analysis factoring out insurance coverage types that could pay out following a car theft or break-in to investigate whether the racial gaps could be related to differences in rates of vehicular crimes

**Approach:**

The general pattern includes:
1. Identify coverages that could pay out following a car theft or break-in
2. Filter out those coverages and calculate a generic base rate premium excluding crime-related coverages
3. Recalculate the location effect using the noncrime generic base rate premium
4. Sort the non-crime premium data into quintiles
5. Rerun the racial population distribution analysis with noncrime premium data
6. Compare to the crime coverage inclusive population distribution to the noncrime population distribution

**Conclusion**

Minor increase in the population distribution in the higher quintiles

In [1]:
import geopandas as gpd
import pandas as pd

# Constants and helper functions

In [2]:
DATA_FP = "./outputs/libertymutual_auto_gis.geojson"

In [3]:
CRIME_RELATED_COVERAGES = [
    "rate_comp",
]

RATE_Q_LABELS = [
    "lowest effect",
    "middle low",
    "median",
    "middle high",
    "highest effect",
]

QUANTILE_GROUP_BY_COLS = ["black_tot", "white_tot", "tot_pop"]

In [4]:
ROUNDING_PRECISION = 2

In [5]:
def get_rate_columns(df_rate_table):
    RATE_PREFIX = "rate_"
    return [col for col in df_rate_table.columns if col.startswith(RATE_PREFIX)]


def prptn_to_pct(val, precision=ROUNDING_PRECISION):
    return round(val, precision) * 100

# Read data

In [6]:
GDF = gpd.read_file(DATA_FP)
GDF["tot_pop"] = GDF["total_pop"]

# Process

## Factor out crime related coverages

In [7]:
non_crime_coverages = [
    col for col in get_rate_columns(GDF) if col not in CRIME_RELATED_COVERAGES
]
gdf = GDF.copy()

## Recalculate location effect

In [8]:
gdf["non_crime_generic_premium"] = round(gdf[non_crime_coverages].sum(axis=1))

gdf["non_crime_location_effect"] = round(
    gdf["non_crime_generic_premium"] / gdf["non_crime_generic_premium"].median(),
    ROUNDING_PRECISION,
)

# Analysis

## Rate quantiles

In [9]:
gdf["non_crime_effect_quantile"] = pd.qcut(
    gdf["non_crime_location_effect"], q=len(RATE_Q_LABELS), labels=RATE_Q_LABELS
)

gdf["effect_quantile"] = pd.qcut(
    gdf["location_effect"], q=len(RATE_Q_LABELS), labels=RATE_Q_LABELS
)

In [10]:
gdf_groupby_quantiles = gdf.groupby("effect_quantile", observed=False)[
    QUANTILE_GROUP_BY_COLS
].sum()
column_sums = gdf_groupby_quantiles.sum(axis=0)
df_distribution = prptn_to_pct(gdf_groupby_quantiles.div(column_sums, axis=1), 3)
df_distribution

Unnamed: 0_level_0,black_tot,white_tot,tot_pop
effect_quantile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lowest effect,11.2,23.9,21.9
middle low,3.0,6.3,5.9
median,8.8,19.7,18.1
middle high,0.9,9.1,7.4
highest effect,76.1,41.1,46.7


In [11]:
gdf_groupby_quantiles = gdf.groupby("non_crime_effect_quantile", observed=False)[
    QUANTILE_GROUP_BY_COLS
].sum()
column_sums = gdf_groupby_quantiles.sum(axis=0)
df_non_crime_distribution = prptn_to_pct(
    gdf_groupby_quantiles.div(column_sums, axis=1), 3
)
df_non_crime_distribution

Unnamed: 0_level_0,black_tot,white_tot,tot_pop
non_crime_effect_quantile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lowest effect,3.3,10.0,8.6
middle low,0.6,3.8,3.2
median,11.6,22.2,21.0
middle high,3.7,13.6,11.6
highest effect,80.8,50.4,55.6


In [12]:
df_diff = df_non_crime_distribution - df_distribution
df_diff

Unnamed: 0_level_0,black_tot,white_tot,tot_pop
non_crime_effect_quantile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
lowest effect,-7.9,-13.9,-13.3
middle low,-2.4,-2.5,-2.7
median,2.8,2.5,2.9
middle high,2.8,4.5,4.2
highest effect,4.7,9.3,8.9


# Chart data

- Data set prepared for chart

In [13]:
def prep_datawrapper_data(df_dist, effect_type):
    df_temp = df_dist.copy()
    df_temp = df_temp.reset_index()
    df_temp.columns = ["Location effect quintile", "Black", "White", "Total"]
    df_temp = df_temp.melt(id_vars=["Location effect quintile"], value_name=effect_type)
    return df_temp


df_all = prep_datawrapper_data(df_distribution, "All coverages")
df_noncrime = prep_datawrapper_data(df_non_crime_distribution, "Noncrime coverages")
df_merge = pd.merge(df_all, df_noncrime)
df_merge

Unnamed: 0,Location effect quintile,variable,All coverages,Noncrime coverages
0,lowest effect,Black,11.2,3.3
1,middle low,Black,3.0,0.6
2,median,Black,8.8,11.6
3,middle high,Black,0.9,3.7
4,highest effect,Black,76.1,80.8
5,lowest effect,White,23.9,10.0
6,middle low,White,6.3,3.8
7,median,White,19.7,22.2
8,middle high,White,9.1,13.6
9,highest effect,White,41.1,50.4
