## Even more Iowa 2D Pareto fronts
This notebook explores the tradeoffs between districting criteria using a dataset of 5,000 county-level Iowa districting plans collected using GerryChain's random ReCom algorithm with a population tolerance of ±0.2%. This population bound has been significantly tightened from previous runs, which used a population tolerance of ±2%.

In our last experiment, we identified these pairings as particularly interesting examples of tradeoffs:
- Absolute/squared population deviation vs. cut edges
- Absolute mean-median gap vs. cut edges

This notebook continues our exploration of these tradeoffs with a different sample. Columns:
- `cut_edges`: Percent of cut edges (relative to total edges)
- `pop_pct`: Average percent population deviation across districts
- `egs`: Efficiency gap (2000 Presidential election)
- `mms`: Mean-median score (2000 Presidential election)
- `polpop`: Polsby-Popper score

In [None]:
%config InlineBackend.figure_formats = ['svg']

import pandas as pd
import pareto
import matplotlib.pyplot as plt; plt.style.use('ggplot')
prefix = 'results/IA_counties_run_3_recom_tight_5000'

In [None]:
data = pd.read_csv('data/IA_counties_run_3_recom_tight_5000.csv')
data['pop_dev_pct_abs'] = abs(data['pop_pct'])  # Absolute average population 
data['pop_dev_pct_squared'] = data['pop_pct']**2
data['mms_abs'] = abs(data['mms'])
collection = pareto.ParetoCollection(updaters=list(data.columns))

In [None]:
collection.add(data.to_dict(orient='records'))

In [None]:
data.columns

In [None]:
def plot_front(x_col, y_col, maxima=False):
    front = collection.front([x_col, y_col], maxima=maxima)
    x = [plan[x_col] for plan in collection.points]
    y = [plan[y_col] for plan in collection.points]
    pareto_x = [plan[x_col] for plan in front]
    pareto_y = [plan[y_col] for plan in front]

    plt.scatter(x, y)
    front_type = 'maxima' if maxima else 'minima'
    plt.scatter(pareto_x, pareto_y, label=f'Pareto front ({front_type})')
    plt.legend()

In [None]:
plot_front('pop_dev_pct_squared', 'cut_edges')
plt.xlabel('Population deviation (squared)')
plt.ylabel('% cut edges')
plt.title('Population deviation² vs. cut edges in Iowa')
plt.savefig(f'{prefix}/squared_population_deviation_vs_cut_edges.png', dpi=300)
plt.xlim(-0.0000005, 0.000004)
plt.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
plt.show()


In [None]:
plot_front('pop_dev_pct_abs', 'cut_edges')
plt.xlabel('Absolute population deviation')
plt.ylabel('% cut edges')
plt.title('Absolute population deviation vs. cut edges in Iowa')
plt.savefig(f'{prefix}/absolute_population_deviation_vs_cut_edges.png', dpi=300)
plt.xlim(-0.0005, 0.0025)
plt.show()

In [None]:
plot_front('mms_abs', 'cut_edges')
plt.xlabel('Absolute mean-median score (2000 Presidential election)')
plt.ylabel('% cut edges')
plt.title('Absolute mean-median score vs. % cut edges in Iowa')
plt.savefig(f'{prefix}/absolute_mean_median_vs_cut_edges.png', dpi=300)
plt.show()