# Iowa Caucus Results Analysis

*Created by Andrew Therriault (https://andrewtherriault.com, https://twitter.com/therriaultphd). <br>
Last modified February 4, 2020.*

This notebook downloads and parses the preliminary Iowa Caucus precinct results from the Iowa Democratic Party website, then performs analysis to look at how voters moved between candidates from the initial counts to the final assignments.

Thanks to Tom Augspurger for his work to parse these results from the IDP's public-facing results page. You can see his original code (which is replicated in the first section of this notebook) at https://github.com/TomAugspurger/idp-results/blob/master/idb.ipynb.



## Pulling and parsing the caucus results data
Most of this code is adapted from Tom Augspurger's script, linked above. I've added a few additional comments for clarity but haven't changed much otherwise.

In [145]:
import io
import requests
import lxml.html
import pandas as pd
import statsmodels.api as sm
from scipy.optimize import nnls, lsq_linear

url = "https://results.thecaucuses.org"
r = requests.get(url)

root = lxml.html.parse(io.StringIO(r.text)).getroot()

#### Generating lists of candidates, counties, etc.

In [3]:
# Bennet, Biden, etc.
head = root.find_class("thead")[0]
header = [x.text for x in list(head.iterchildren())]

# First Expression, Final Expression, SDE, ...
subhead = root.find_class("sub-head")[0]
subheader = [x.text for x in list(subhead.iterchildren())]

In [4]:
columns = pd.MultiIndex.from_arrays([
    pd.Series(header).fillna(method='ffill'),
    pd.Series(subheader).fillna(method='ffill').fillna('')
], names=['candidate', 'round'])

In [5]:
counties = root.find_class("precinct-county")
county_names = [x[0].text for x in counties]
counties_data = root.find_class("precinct-data")
county = counties_data[0]
rows = []

#### Looping over counties and precincts to pull just the rows with individual caucus results (dropping totals for each county)

In [6]:
for name, county in zip(county_names, counties_data):
    if len(county) > 1:
        # satellites only have a total
        county = county[:-1]

    for precinct in county:
        # exclude total
        rows.append((name,) + tuple(x.text for x in precinct))

#### Creating dataframe of results
Tom's original code stacked these into a longer dataframe (with one candidate per row) but I'm keeping it in wide format for my own purposes. 

In [60]:
results = (
    pd.DataFrame(rows, columns=columns)
      .set_index(['County', 'Precinct'])
      .apply(pd.to_numeric)
)
results

Unnamed: 0_level_0,candidate,Bennet,Bennet,Bennet,Biden,Biden,Biden,Bloomberg,Bloomberg,Bloomberg,Buttigieg,...,Warren,Yang,Yang,Yang,Other,Other,Other,Uncommitted,Uncommitted,Uncommitted
Unnamed: 0_level_1,round,First Expression,Final Expression,SDE,First Expression,Final Expression,SDE,First Expression,Final Expression,SDE,First Expression,...,SDE,First Expression,Final Expression,SDE,First Expression,Final Expression,SDE,First Expression,Final Expression,SDE
County,Precinct,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
Adair,1NW ADAIR,0,0,0,6,6,0.0784,0,0,0.0000,8,...,0.1569,0,0,0.0000,0,0,0.0,0,0,0.0
Adair,5GF GREENFIELD,0,0,0,8,0,0.0000,0,0,0.0000,10,...,0.0000,12,13,0.1569,0,0,0.0,0,0,0.0
Adams,Adams 1,0,0,0,5,5,0.0857,0,0,0.0000,7,...,0.0857,0,0,0.0000,0,0,0.0,0,0,0.0
Adams,Adams 4,0,0,0,6,6,0.1714,0,0,0.0000,3,...,0.1714,0,0,0.0000,0,0,0.0,0,0,0.0
Adams,Adams 5,0,0,0,4,4,0.0857,0,0,0.0000,5,...,0.0000,0,0,0.0000,0,0,0.0,0,0,0.0
Allamakee,Pct 01 - WL/HV,0,0,0,1,0,0.0000,0,0,0.0000,4,...,0.0000,0,0,0.0000,0,0,0.0,0,0,0.0
Allamakee,Pct 11 - Waukon 3,1,0,0,8,12,0.2333,0,0,0.0000,9,...,0.0000,6,0,0.0000,0,0,0.0,0,1,0.0
Allamakee,Pct 02 - FC/JF/LL/MK/UP,0,0,0,6,0,0.0000,0,0,0.0000,11,...,0.1556,9,13,0.2333,0,2,0.0,0,0,0.0
Allamakee,Pct 04 - PV City,0,0,0,7,7,0.1556,0,0,0.0000,15,...,0.0000,10,10,0.1556,0,0,0.0,0,0,0.0
Allamakee,Pct 05 - LT/PC/WV City,0,0,0,2,0,0.0000,0,0,0.0000,5,...,0.0778,0,0,0.0000,0,0,0.0,0,0,0.0


In [59]:
total_votes_by_precinct = results.xs('Final Expression', level='round', axis=1).sum(axis=1)
print(total_votes_by_precinct.describe())
print(total_votes_by_precinct.value_counts().sort_index().head())

count    1104.000000
mean       97.369565
std       113.964347
min         0.000000
25%        26.000000
50%        59.000000
75%       120.250000
max       830.000000
dtype: float64
0    5
1    1
2    2
3    4
4    6
dtype: int64


This shows us 1,104 precincts with results, but 5 of those (the out-of-state and CD1 / CD2 / CD3 / CD4 satellite caucuses) have 0s for their totals, so this matches the 1,099 reported precincts.

#### Confirming that the data looks right
Spot-checked the SDE results against the NY Times's reported results (https://www.nytimes.com/interactive/2020/02/04/us/elections/results-iowa-caucus.html). The SDE numbers per candidate look spot on, while the total vote counts (overall and per candidate) are slightly under what the Times is reporting. Our data has 110,666 first expression votes and 107,496 final expression votes, while the Times shows 111,237 and 108,050 respectively. The extra votes shown there are less than 1% and seem fairly evenly distributed across the 5 major candidates (Biden is +88 in the Times numbers, Buttigieg +97, Klobuchar +129, etc.) so we'll run with it for now - these are preliminary results anyway so not authoritative.

In [159]:
results.sum()

candidate    round           
Bennet       First Expression       96.0000
             Final Expression        1.0000
             SDE                     0.0000
Biden        First Expression    16179.0000
             Final Expression    14176.0000
             SDE                   210.3439
Bloomberg    First Expression      112.0000
             Final Expression        6.0000
             SDE                     0.1333
Buttigieg    First Expression    23666.0000
             Final Expression    27030.0000
             SDE                   362.6366
Delaney      First Expression        0.0000
             Final Expression        0.0000
             SDE                     0.0000
Gabbard      First Expression      231.0000
             Final Expression       12.0000
             SDE                     0.0000
Klobuchar    First Expression    14032.0000
             Final Expression    13357.0000
             SDE                   169.6938
Patrick      First Expression       46.0000
  

In [18]:
results.sum().groupby('round').sum()

round
Final Expression    107496.0000
First Expression    110666.0000
SDE                   1347.2652
dtype: float64

#### Saving the data for later use

In [61]:
results.to_csv('iowa_preliminary_results_20200204.csv')

## Analysis of viability across rounds

Number of precincts each candidate received >0 votes in during the First and Final Expressions. (Having > 0 votes in the final expression means they were either viable at the time of the first or were able to meet the threshold with help from other candidates' supporters.)

In [81]:
viable_first = (results.xs('First Expression', level='round', axis=1) > 1)
viable_final = (results.xs('Final Expression', level='round', axis=1) > 1)

In [82]:
viable_first.sum()

candidate
Bennet           18
Biden           984
Bloomberg        24
Buttigieg      1048
Delaney           0
Gabbard          62
Klobuchar       885
Patrick           1
Sanders         972
Steyer          364
Warren          929
Yang            578
Other            25
Uncommitted     157
dtype: int64

In [83]:
viable_final.sum()

candidate
Bennet           0
Biden          744
Bloomberg        2
Buttigieg      959
Delaney          0
Gabbard          2
Klobuchar      573
Patrick          0
Sanders        823
Steyer          40
Warren         686
Yang            99
Other           18
Uncommitted    126
dtype: int64

In [86]:
viable_diff = viable_final.sum() - viable_first.sum()
viable_diff

candidate
Bennet         -18
Biden         -240
Bloomberg      -22
Buttigieg      -89
Delaney          0
Gabbard        -60
Klobuchar     -312
Patrick         -1
Sanders       -149
Steyer        -324
Warren        -243
Yang          -479
Other           -7
Uncommitted    -31
dtype: int64

In [87]:
diff_pct = (100 * viable_diff / viable_first.sum()).round(1)
diff_pct.sort_values()

candidate
Bennet        -100.0
Patrick       -100.0
Gabbard        -96.8
Bloomberg      -91.7
Steyer         -89.0
Yang           -82.9
Klobuchar      -35.3
Other          -28.0
Warren         -26.2
Biden          -24.4
Uncommitted    -19.7
Sanders        -15.3
Buttigieg       -8.5
Delaney          NaN
dtype: float64

## Analysis of switching

First, calculating shifts between rounds, then using that to calculate the number of "up for grabs" voters per candidate in precincts where the candidate was not viable (e.g., got no votes in second round) and the number of voters added in the case where candidates were viable.

In [92]:
shifts = (results.xs('Final Expression', level='round', axis=1) - results.xs('First Expression', level='round', axis=1))
shifts.head(10)

Unnamed: 0_level_0,candidate,Bennet,Biden,Bloomberg,Buttigieg,Delaney,Gabbard,Klobuchar,Patrick,Sanders,Steyer,Warren,Yang,Other,Uncommitted
County,Precinct,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Adair,1NW ADAIR,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Adair,5GF GREENFIELD,0,-8,0,2,0,0,6,0,5,0,-6,1,0,0
Adams,Adams 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Adams,Adams 4,0,0,0,0,0,0,0,0,-2,0,2,0,0,0
Adams,Adams 5,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Allamakee,Pct 01 - WL/HV,0,-1,0,0,0,0,1,0,0,0,0,0,0,0
Allamakee,Pct 11 - Waukon 3,-1,4,0,3,0,0,0,0,1,-2,0,-6,0,1
Allamakee,Pct 02 - FC/JF/LL/MK/UP,0,-6,0,3,0,0,5,0,0,-8,0,4,2,0
Allamakee,Pct 04 - PV City,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Allamakee,Pct 05 - LT/PC/WV City,0,-2,0,2,0,-1,0,0,0,0,1,0,0,0


In [93]:
gains = shifts[viable_final]
gains.head(10)

Unnamed: 0_level_0,candidate,Bennet,Biden,Bloomberg,Buttigieg,Delaney,Gabbard,Klobuchar,Patrick,Sanders,Steyer,Warren,Yang,Other,Uncommitted
County,Precinct,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Adair,1NW ADAIR,,0.0,,0.0,,,0.0,,0.0,,0.0,,,
Adair,5GF GREENFIELD,,,,2.0,,,6.0,,5.0,,,1.0,,
Adams,Adams 1,,0.0,,0.0,,,0.0,,0.0,,0.0,,,
Adams,Adams 4,,0.0,,0.0,,,0.0,,,,2.0,,,
Adams,Adams 5,,0.0,,0.0,,,0.0,,0.0,,,,,
Allamakee,Pct 01 - WL/HV,,,,0.0,,,1.0,,0.0,,,,,
Allamakee,Pct 11 - Waukon 3,,4.0,,3.0,,,0.0,,1.0,,,,,
Allamakee,Pct 02 - FC/JF/LL/MK/UP,,,,3.0,,,5.0,,0.0,,0.0,4.0,2.0,
Allamakee,Pct 04 - PV City,,0.0,,0.0,,,,,0.0,,,0.0,,
Allamakee,Pct 05 - LT/PC/WV City,,,,2.0,,,0.0,,0.0,,1.0,,,


In [95]:
up_for_grabs = -shifts[-viable_final]
up_for_grabs.head(10)

Unnamed: 0_level_0,candidate,Bennet,Biden,Bloomberg,Buttigieg,Delaney,Gabbard,Klobuchar,Patrick,Sanders,Steyer,Warren,Yang,Other,Uncommitted
County,Precinct,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Adair,1NW ADAIR,-0.0,,-0.0,,-0.0,-0.0,,-0.0,,-0.0,,-0.0,-0.0,-0.0
Adair,5GF GREENFIELD,-0.0,8.0,-0.0,,-0.0,-0.0,,-0.0,,-0.0,6.0,,-0.0,-0.0
Adams,Adams 1,-0.0,,-0.0,,-0.0,-0.0,,-0.0,,-0.0,,-0.0,-0.0,-0.0
Adams,Adams 4,-0.0,,-0.0,,-0.0,-0.0,,-0.0,2.0,-0.0,,-0.0,-0.0,-0.0
Adams,Adams 5,-0.0,,-0.0,,-0.0,-0.0,,-0.0,,-0.0,-0.0,-0.0,-0.0,-0.0
Allamakee,Pct 01 - WL/HV,-0.0,1.0,-0.0,,-0.0,-0.0,,-0.0,,-0.0,-0.0,-0.0,-0.0,-0.0
Allamakee,Pct 11 - Waukon 3,1.0,,-0.0,,-0.0,-0.0,,-0.0,,2.0,-0.0,6.0,-0.0,-1.0
Allamakee,Pct 02 - FC/JF/LL/MK/UP,-0.0,6.0,-0.0,,-0.0,-0.0,,-0.0,,8.0,,,,-0.0
Allamakee,Pct 04 - PV City,-0.0,,-0.0,,-0.0,-0.0,-0.0,-0.0,,-0.0,-0.0,,-0.0,-0.0
Allamakee,Pct 05 - LT/PC/WV City,-0.0,2.0,-0.0,,-0.0,1.0,,-0.0,,-0.0,,-0.0,-0.0,-0.0


#### Gains by candidate from switching

In [103]:
gains.describe(percentiles=[0.01,0.05,0.1]).T

Unnamed: 0_level_0,count,mean,std,min,1%,5%,10%,50%,max
candidate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bennet,0.0,,,,,,,,
Biden,744.0,1.735215,3.529693,-23.0,-10.0,0.0,0.0,1.0,17.0
Bloomberg,2.0,1.0,1.414214,0.0,0.02,0.1,0.2,1.0,2.0
Buttigieg,959.0,4.574557,6.98331,-61.0,0.0,0.0,0.0,2.0,56.0
Delaney,0.0,,,,,,,,
Gabbard,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Klobuchar,573.0,4.415358,8.207657,-33.0,0.0,0.0,0.0,2.0,64.0
Patrick,0.0,,,,,,,,
Sanders,823.0,2.967193,4.712167,-24.0,0.0,0.0,0.0,2.0,46.0
Steyer,40.0,0.975,1.290746,-1.0,-0.61,0.0,0.0,0.0,4.0


In [97]:
gains.sum()

candidate
Bennet            0.0
Biden          1291.0
Bloomberg         2.0
Buttigieg      4387.0
Delaney           0.0
Gabbard           0.0
Klobuchar      2530.0
Patrick           0.0
Sanders        2442.0
Steyer           39.0
Warren         3099.0
Yang            -10.0
Other           121.0
Uncommitted     767.0
dtype: float64

Some of those results seem strange because you're not supposed to lose voters if you're viable, but it's a small enough number that maybe some people just went home or snuck off against the rules. Caucuses are messy and these things happen, so let's just go with it.

In any case, the interesting thing there is that Sanders and Biden gained fewer switchers (overall and on average across precincts) than Buttigieg, Warren, or even Klobuchar.

## Who switched to whom?

This is the fun part. Now that I know how many people are switching and from which candidates to whom, we can model which candidates' supporters go where. To do this, I'm going to build a linear regression for each candidate and predict the number of supporters gained as a function of the up-for-grabs voters from each non-viable candidate after the first round. The coefficient of each candidate's up-for-grabs numbers will be the estimated proportion of the non-viable candidates' voters who switch to that candidate (*when that candidate is non-viable*). There may be uncertainty in this because of the negative numbers for a handful of viable candidates' "gains", but we'll power through - it's a only a tiny fraction of the data.

I use a bounded least squares regression here, which constrains the coefficients to be between 0 and 1 (since the results wouldn't make sense otherwise, given the interpretation). This is a quick-and-dirty approach - there's probably a better way to solve this as a more general constrained optimization problem, but I don't have a better idea off the top of my head (waiting for results made for a late night last night!).

In [166]:
switches = pd.DataFrame(index=gains.columns)
for i in ['Biden', 'Buttigieg', 'Klobuchar', 'Sanders', 'Warren']:
    y = gains.loc[viable_final[i],i]
    X = up_for_grabs[viable_final[i]].fillna(0).drop(i, axis=1)
    regression = lsq_linear(X,y, (0,1))
    props = 100*pd.Series(regression['x'], index=X.columns).round(3)
    switches.loc[:,i] = props
#not showing candidates with < 200 first expression voters, because those results are just too noisy
switches.drop(gains.columns[(results.sum().xs('First Expression',level='round') < 200)], axis=0)

Unnamed: 0_level_0,Biden,Buttigieg,Klobuchar,Sanders,Warren
candidate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Biden,,14.8,44.3,9.6,24.7
Buttigieg,18.1,,31.2,11.8,28.5
Gabbard,22.1,0.0,72.2,57.6,42.2
Klobuchar,27.9,47.1,,3.2,24.5
Sanders,7.8,19.1,2.7,,44.4
Steyer,16.3,20.8,19.2,8.6,12.0
Warren,11.7,32.6,21.6,29.5,
Yang,0.0,26.3,21.2,23.4,13.0
Uncommitted,42.0,0.0,21.3,27.4,8.6


The results here are a little tough to interpret, because what they show is *"what percentage of candidate A's voters go to candidate B when candidate A is not viable but candidate B is?"*. (Candidate A is on the y axis, candidate B on the x axis.) So for example, when Biden is not viable but Buttigieg is, Buttigieg is estimated to get 14.8% of Biden's supporters. But when Klobuchar is viable and Biden's not, she gets 44.3% of his voters. The numbers don't sum to 100 on either axis because each cell is a different scenario - that's why we can't just analyze these as "second choices" the way we would in a poll, because not all the second choices are available in every case.

Some of the more interesting results here, in no particular order (*note that all these are based on estimated numbers, even if I don't give that caveat every time!*):
* Biden and Sanders have very distinct voters. When Biden's not viable and Sanders is, only 9.6% of Biden's voters go to Sanders, and in the reverse scenario only 7.8% of Sanders voters go to Biden.
* Biden and Buttigieg are likewise pretty highly-seperated. When Biden isn't viable, Buttigieg gets 14.8% of them, and in the reverse case Biden gets 18.1%
* There's a fairly strong affinity between Biden voters and Klobuchar voters. When Biden isn't viable, she gets 44.3% of his voters, and in the reverse case he gets 27.9% of hers.
* Likewise Buttigieg and Klobuchar - when he's viable and she isn't, he gets 47.1% of hers, and she gets 31.2% of his when he's not viable and she is.
* On the other end of things, Klobuchar and Sanders had almost no overlap at all. When he was out and she was in, she got 2.7% of his voters, and in  the reverse case he got just 3.2%.
* When she wasn't viable, Warren's voters shifted most enthusiastically to Buttigieg (32.6%), and to a lesser extent Sanders (29.5%) and Klobuchar (21.6%) but not as much to Biden (11.7%)
* When Sanders wasn't viable, his voters went largely for Warren (she got 44.4% of his up-for-grabs voters when she was still in it), even if the reverse wasn't true.
* When Tulsi Gabbard's not viable (which was almost always), her voters seemed to go to Klobuchar when she was viable, and othersise Sanders and Warren. (Not many datapoints, though.)
* Biden didn't seem to get any traction from Yang voters, and Warren didn't do especially well with them either - Buttigieg, Klobuchar, and Sanders all got more of a boost there.
* Steyer's voters seemed to learn more toward the more moderate candidates (Biden, Buttigieg, and Klobuchar) and away from Warren and Sanders (just 8.6% went to Sanders when he was viable and Steyer wasn't, and 12% for Warren when she was viable).
* Uncommitted voters tended to break largely for Biden, and didn't show much love at all for Buttigieg or Warren.

## So what does it all mean?

I'll first caution that this is a quick analysis of preliminary data, so giant mounds of salt should be taken with this. But all that said... Putting on my pundit hat, here are my big takeaways as we look ahead:
* If Klobuchar gets out after this, her voters are likely to split their support among everyone but Sanders. This could make NH more interesting, since Sanders seems to be in the lead there, but a few points of support for any of the three other major contenders would make it a virtual tie at this point.
* If we don't factor in Klobuchar or Yang, we end up with 4 candidates without a clear alignment. While some pundits are tempted to lump Biden and Buttigieg together as the moderates and Warren / Sanders as the progressives, there isn't as much overlap among their supporters as you'd expect from that simplistic view.
* More generally, Joe Biden and Bernie Sanders are very few voters' second choices. So that could put a ceiling on their potential if/when the field narrows further. Buttigieg and Warren each do fairly well with a broader set of other candidates' supporters, so they could benefit more from other candidates dropping out. (Or at the very least, this should boost their potential VP credentials if they don't pull out a win in the primary---they don't seem to have particular negatives with specific other parts of the party.)