<a href="https://colab.research.google.com/github/wilson428/100K_stars/blob/master/Calculating_Apportionment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [13]:
from IPython.display import HTML, Latex
import numpy as np
import pandas as pd

# How to Hack On Apportionment in the U.S. House of Representatives

The House of Representatives has stood at 435 members for nearly 100 years as the population of the country has tripled. How much larger should it be to minimize the difference in how many people each Member represents?

Before we explore that question in the notebook [EnlargeTheHouse.ipynb](https://colab.research.google.com/drive/1TjyO-2bO4z5X_Hblw0AoINpxTMjWDkIS?usp=sharing), we need to replicate the existing apportionment method in Python and make sure it matches the official tally for the [2020 Census decennial population count](https://www.census.gov/data/tables/2020/dec/2020-apportionment-data.html).

## Loading the Data

The [GitHub repo for this demo](https://github.com/TimeMagazineLabs/CongressionalApportionment) has data files from Census.gov for the 2010 and 2020 decennial Census counts as well as the official apportionments so that we can check our work. We'll use [Pandas](https://pandas.pydata.org/). Here's what the data looks like:

In [14]:
data_2020 = pd.read_csv('https://raw.githubusercontent.com/TimeMagazineLabs/CongressionalApportionment/main/data/apportionment_2020.csv',dtype={'State': 'string', 'Abbr': 'string', 'Reps': 'Int64'})
data_2020['Per_Rep'] = np.int64(data_2020['Population'] / data_2020['Reps'])
data_2020.head()

Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep
0,Alabama,AL,2020,5030053,7,718579
1,Alaska,AK,2020,736081,1,736081
2,Arizona,AZ,2020,7158923,9,795435
3,Arkansas,AR,2020,3013756,4,753439
4,California,CA,2020,39576757,52,761091


## How Apportionment Works: The Equal Proportions Method

Since 1940, Congress [has used the "Equal Proportions Method"](https://www.census.gov/topics/public-sector/congressional-apportionment/about/computing.html) to apportion the 435 House seats to each state&mdash;a figure that is enshired by law, not by mathematics or logic.

First, per the Constitution, every state gets one seat. For the remaining 385, each state's "priority" is measured by a simple formula that divides its population, as of the most recent decennial Census, by the square root of the product of the number of seats it currently has and that number plus 1 (the [geometric mean](https://mathworld.wolfram.com/GeometricMean.html)):
\begin{equation}
priority = \frac{P}{\sqrt{n*(n+1)}}
\end{equation}
For each seat, one at a time, the state with the highest priority is awarded that seat. Then its priority is recalculated and it moves further back in the line. Let's write a Python function to calculate the priority of a state, representated as a row in the table (aka "DataFrame") above. (We'll be adding a `RepsCalculated` column to compare to the official tally.)

In [15]:
def EqualProportionsMethod(st):
  reps = st["RepsCalculated"]
  priority = st["Population"] / np.sqrt(reps * (reps + 1))
  return priority

## Recalculating the apportionments

Let's recalculate the apportionment to make sure it matches the official tally. We'll make a copy of the data frame for each trial run and add columns for the calculated reps and the priority from the Equal Proportions Method algorithm, using the Pandas `.apply` method for DataFrames. Remember, each state starts with 1 seat, so we'll initialize `RepsCalculated` to 1 and start the apportionment with 385 seats remaining.

Since the data includes Puerto Rico and D.C., neither of which are eligible for representatives, we'll need to remove them.

In [16]:
data_2020_states_only = data_2020[~data_2020['State'].isin(["District of Columbia", "Puerto Rico"])]
data_2020_states_only.reset_index(drop=True)
print(data_2020.shape[0], data_2020_states_only.shape[0])

52 50


In [17]:
test_2020 = data_2020_states_only.copy()
test_2020['RepsCalculated'] = 1
test_2020['Priority'] = test_2020.apply(EqualProportionsMethod, 1)
test_2020.head()

Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority
0,Alabama,AL,2020,5030053,7,718579,1,3556785.0
1,Alaska,AK,2020,736081,1,736081,1,520487.9
2,Arizona,AZ,2020,7158923,9,795435,1,5062123.0
3,Arkansas,AR,2020,3013756,4,753439,1,2131047.0
4,California,CA,2020,39576757,52,761091,1,27984990.0


As expected, California has the highest priority, which we can determine with the `.idxmax()` method, which returns the index of the row with the highest value of a specified column

In [18]:
test_2020.loc[test_2020['Priority'].idxmax(),'State']

'California'

Great, now we just need a function to find the state with the highest priority, add a seat to that state, and recalculate it's priority. (It would be wasteful to recompute each state's priority after each assignment since the priority doesn't change unless a seat is added.) While we're at it, let's make a list called `ORDER` to see the order in which states get a representative. This should be initialized once per complete trial

In [19]:
def addNextSeat(df, ORDER=[]):
  indexNext = df['Priority'].idxmax() # The index of the row with the highest priority
  # Add a seat to the state in that row
  df.loc[indexNext,'RepsCalculated'] += 1
  # Recompute the priority for this state
  df.loc[indexNext,'Priority'] = EqualProportionsMethod(df.loc[indexNext])
  # Add the state to the ORDER list
  ORDER.append(df.loc[indexNext,'Abbr'])

Let's see the function works for just the first seat. Updating DataFrames inside functions can be tricky.

In [20]:
ORDER_TEST = []  
addNextSeat(test_2020, ORDER_TEST)
print(ORDER_TEST)
test_2020.head()

['CA']


Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority
0,Alabama,AL,2020,5030053,7,718579,1,3556785.0
1,Alaska,AK,2020,736081,1,736081,1,520487.9
2,Arizona,AZ,2020,7158923,9,795435,1,5062123.0
3,Arkansas,AR,2020,3013756,4,753439,1,2131047.0
4,California,CA,2020,39576757,52,761091,2,16157140.0


Great! California got a seat, and there are now 384 seats left. Let's give the remaining ones a go.

In [21]:
SEATS_LEFT = 384
while SEATS_LEFT > 0:
  addNextSeat(test_2020, ORDER_TEST)
  SEATS_LEFT -= 1

test_2020.head()

Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority
0,Alabama,AL,2020,5030053,7,718579,7,672169.105833
1,Alaska,AK,2020,736081,1,736081,1,520487.866603
2,Arizona,AZ,2020,7158923,9,795435,9,754616.742459
3,Arkansas,AR,2020,3013756,4,753439,4,673896.32836
4,California,CA,2020,39576757,52,761091,52,753877.180693


In [22]:
print(ORDER_TEST[0:10])
print(ORDER_TEST[-10:])
print("Last state to get a seat is", ORDER_TEST[-1])

['CA', 'TX', 'CA', 'FL', 'NY', 'TX', 'CA', 'PA', 'IL', 'CA']
['TX', 'IL', 'RI', 'AL', 'NC', 'OR', 'CO', 'CA', 'MT', 'MN']
Last state to get a seat is MN


Looks good! We can double-check them all easily by taking the absolute value of the difference between our calculations from the official apportionment:

In [23]:
test_2020['Error'] = np.abs(test_2020['Reps'] - test_2020['RepsCalculated'])
print("ERROR:", sum(test_2020['Error']))
test_2020.head()

ERROR: 0


Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority,Error
0,Alabama,AL,2020,5030053,7,718579,7,672169.105833,0
1,Alaska,AK,2020,736081,1,736081,1,520487.866603,0
2,Arizona,AZ,2020,7158923,9,795435,9,754616.742459,0
3,Arkansas,AR,2020,3013756,4,753439,4,673896.32836,0
4,California,CA,2020,39576757,52,761091,52,753877.180693,0


Let's wrap this whole process in a function for easy manipulation:

In [36]:
def calculateApportionment(popData=data_2020, ignoreStates=['District of Columbia', 'Puerto Rico'], TOTAL_SEATS=435):
  ORDER_SEATS = []

  data_filtered = popData[~popData['State'].isin(ignoreStates)]
  data_filtered.reset_index(drop=True)

  test_apportionment = data_filtered.copy()
  test_apportionment['RepsCalculated'] = 1
  test_apportionment["Priority"] = test_apportionment.apply(EqualProportionsMethod, 1)

  SEATS_LEFT = TOTAL_SEATS - test_apportionment.shape[0] # This is the number of rows, which will account for any hypothetical states we add
  while SEATS_LEFT > 0:
    addNextSeat(test_apportionment, ORDER_SEATS)
    SEATS_LEFT -= 1

  return test_apportionment, ORDER_SEATS 

## Now to Have Some Fun!

We'll get fancy in the next notebook, but let's start messing around with statehood and the total number of seats just to get our feet wet. What if D.C. was a state, but the total number of seats remained at 435?

In [38]:
demo_2020_with_dc, _ = calculateApportionment(ignoreStates=['Puerto Rico'])
demo_2020_with_dc.head()

Unnamed: 0,State,Abbr,Year,Population,Reps,Per_Rep,RepsCalculated,Priority
0,Alabama,AL,2020,5030053,7,718579,7,672169.105833
1,Alaska,AK,2020,736081,1,736081,1,520487.866603
2,Arizona,AZ,2020,7158923,9,795435,9,754616.742459
3,Arkansas,AR,2020,3013756,4,753439,4,673896.32836
4,California,CA,2020,39576757,52,761091,52,753877.180693


## What's the difference?

Let's write a function to location which states lost seat(s) if DC was a state

In [42]:
def compareToReality(df):
  df['Difference'] = df['RepsCalculated'] - df['Reps']
  df['Per_Rep_Computed'] = df['Population'] / df['RepsCalculated']

  changes = df[df['Difference'] != 0]
  for index, row in df.iterrows():
    if (row['Difference'] < 0):
      print('%s LOST %s seat(s)' % (row['State'], -row['Difference']))
    elif (row['Difference'] > 0):
      print('%s GAINED %s seat(s)' % (row['State'], row['Difference']))

compareToReality(demo_2020_with_dc)

District of Columbia GAINED 1 seat(s)
Minnesota LOST 1 seat(s)


Adding DC, which gets one seat, takes it's spot from Minnesota, which makes sense since Minnesota otherwise gets the last seat in the current apportionment

What if we add a seat for DC?

In [44]:
order_436 = []
demo_2020_with_dc_436, order_436 = calculateApportionment(ignoreStates=['Puerto Rico'], TOTAL_SEATS=436)
compareToReality(demo_2020_with_dc_436)
print("Last seat went to %s" % order_436[-1])

District of Columbia GAINED 1 seat(s)
Last seat went to MN


So adding D.C. _plus_ an extra seat for D.C. wouldn't alter any other apportionments, as we'd expect since D.C. would be the third smallest state: 

In [46]:
demo_2020_with_dc_436.sort_values("Population").head(10)[["State", "Population", "Year", "RepsCalculated"]]

Unnamed: 0,State,Population,Year,RepsCalculated
51,Wyoming,577719,2020,1
46,Vermont,643503,2020,1
8,District of Columbia,689545,2020,1
1,Alaska,736081,2020,1
34,North Dakota,779702,2020,1
42,South Dakota,887770,2020,1
7,Delaware,990837,2020,1
26,Montana,1085407,2020,2
40,Rhode Island,1098163,2020,2
19,Maine,1363582,2020,2


And what if we add Puerto Rico but not DC?

In [47]:
demo_2020_with_pr, _ = calculateApportionment(ignoreStates=['District of Columbia'])
compareToReality(demo_2020_with_pr)

California LOST 1 seat(s)
Colorado LOST 1 seat(s)
Minnesota LOST 1 seat(s)
Montana LOST 1 seat(s)
Puerto Rico GAINED 4 seat(s)


Likewise, adding 4 seats for PR doesn't distrupt the apportionment otherwise

In [48]:
demo_2020_with_pr_439, _ = calculateApportionment(ignoreStates=['District of Columbia'], TOTAL_SEATS=439)
compareToReality(demo_2020_with_pr_439)

Puerto Rico GAINED 4 seat(s)


Here's DC and PR, without expanding the House

In [49]:
demo_2020_with_dc_and_pr, _ = calculateApportionment(ignoreStates=[])
compareToReality(demo_2020_with_dc_and_pr)

California LOST 1 seat(s)
Colorado LOST 1 seat(s)
District of Columbia GAINED 1 seat(s)
Minnesota LOST 1 seat(s)
Montana LOST 1 seat(s)
Oregon LOST 1 seat(s)
Puerto Rico GAINED 4 seat(s)


And making room at the table

In [50]:
demo_2020_with_dc_and_pr_440, _ = calculateApportionment(ignoreStates=[], TOTAL_SEATS=440)
compareToReality(demo_2020_with_dc_and_pr_440)

District of Columbia GAINED 1 seat(s)
Puerto Rico GAINED 4 seat(s)


One last thing we can check: It's been reported that New York [very narrowly missed out on a seat by a different in 89 people](https://www.nytimes.com/2021/04/26/nyregion/new-york-census-congress.html). Would it have gotten the 436th seat?

In [51]:
demo_2020_436, ORDER_436 = calculateApportionment(TOTAL_SEATS=436)
print(ORDER_436[-1])

NY


Indeed! Add if we give it an extra 90 people?

In [52]:
data_2020_ny_plus90 = data_2020_states_only.copy()
nyIdx = data_2020_ny_plus90.loc[data_2020_ny_plus90['Abbr'] == 'NY'].index[0]
data_2020_ny_plus90.loc[nyIdx, 'Population'] += 90

test_NY90, _ = calculateApportionment(popData = data_2020_ny_plus90)
compareToReality(test_NY90)

Minnesota LOST 1 seat(s)
New York GAINED 1 seat(s)


It's true! Now let's move on to the [next Notebook](https://colab.research.google.com/drive/1TjyO-2bO4z5X_Hblw0AoINpxTMjWDkIS?usp=sharing) to try different seat values in more depth