# Iowa Caucus Results Analysis

*Created by Andrew Therriault (https://andrewtherriault.com, https://twitter.com/therriaultphd). <br>
Last modified February 5, 2020.*

This notebook downloads and parses the preliminary Iowa Caucus precinct results from the Iowa Democratic Party website, then performs analysis to look at how voters moved between candidates from the initial counts to the final assignments.

Thanks to Tom Augspurger for his work to parse these results from the IDP's public-facing results page. You can see his original code (which is replicated in the first section of this notebook) at https://github.com/TomAugspurger/idp-results/blob/master/idb.ipynb.

####Notes:
* 2/4: Started with 62% of results in (1,099 precincts)
* 2/5: 
   * Updated using data as of 3pm EST (1,320 precincts)
   * Updated again using data as of 11:20pm EST (



## Pulling and parsing the caucus results data
Most of this code is adapted from Tom Augspurger's script, linked above. I've added a few additional comments for clarity but haven't changed much otherwise.

In [1]:
import io
import requests
import lxml.html
import pandas as pd
import statsmodels.api as sm
from scipy.optimize import nnls, lsq_linear

url = "https://results.thecaucuses.org"
r = requests.get(url)

root = lxml.html.parse(io.StringIO(r.text)).getroot()

#### Generating lists of candidates, counties, etc.

In [2]:
# Bennet, Biden, etc.
head = root.find_class("thead")[0]
header = [x.text for x in list(head.iterchildren())]

# First Expression, Final Expression, SDE, ...
subhead = root.find_class("sub-head")[0]
subheader = [x.text for x in list(subhead.iterchildren())]

In [3]:
columns = pd.MultiIndex.from_arrays([
    pd.Series(header).fillna(method='ffill'),
    pd.Series(subheader).fillna(method='ffill').fillna('')
], names=['candidate', 'round'])

In [4]:
counties = root.find_class("precinct-county")
county_names = [x[0].text for x in counties]
counties_data = root.find_class("precinct-data")
county = counties_data[0]
rows = []

#### Looping over counties and precincts to pull just the rows with individual caucus results (dropping totals for each county)

In [5]:
for name, county in zip(county_names, counties_data):
    if len(county) > 1:
        # satellites only have a total
        county = county[:-1]

    for precinct in county:
        # exclude total
        rows.append((name,) + tuple(x.text for x in precinct))

#### Creating dataframe of results
Tom's original code stacked these into a longer dataframe (with one candidate per row) but I'm keeping it in wide format for my own purposes. 

In [6]:
results = (
    pd.DataFrame(rows, columns=columns)
      .set_index(['County', 'Precinct'])
      .apply(pd.to_numeric)
)
results

Unnamed: 0_level_0,candidate,Bennet,Bennet,Bennet,Biden,Biden,Biden,Bloomberg,Bloomberg,Bloomberg,Buttigieg,...,Warren,Yang,Yang,Yang,Other,Other,Other,Uncommitted,Uncommitted,Uncommitted
Unnamed: 0_level_1,round,First Expression,Final Expression,SDE,First Expression,Final Expression,SDE,First Expression,Final Expression,SDE,First Expression,...,SDE,First Expression,Final Expression,SDE,First Expression,Final Expression,SDE,First Expression,Final Expression,SDE
County,Precinct,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
Adair,2NE STUART,0,0,0,6,0,0.0000,0,0,0.0000,10,...,0.0000,1,1,0.0000,0,0,0.0,0,0,0.0000
Adair,1NW ADAIR,0,0,0,6,6,0.0784,0,0,0.0000,8,...,0.1569,0,0,0.0000,0,0,0.0,0,0,0.0000
Adair,3SW FONTANELLE,0,0,0,9,9,0.1569,0,0,0.0000,0,...,0.0000,15,15,0.3922,0,0,0.0,0,0,0.0000
Adair,4SE ORIENT,0,0,0,7,7,0.1569,0,0,0.0000,6,...,0.2353,0,0,0.0000,0,0,0.0,0,0,0.0000
Adair,5GF GREENFIELD,0,0,0,8,0,0.0000,0,0,0.0000,10,...,0.0000,12,13,0.1569,0,0,0.0,0,0,0.0000
Adams,Adams 2,0,0,0,4,4,0.0857,0,0,0.0000,3,...,0.1714,0,0,0.0000,0,0,0.0,0,0,0.0000
Adams,Adams 5,0,0,0,4,4,0.0857,0,0,0.0000,5,...,0.0000,0,0,0.0000,0,0,0.0,0,0,0.0000
Adams,Adams 4,0,0,0,6,6,0.1714,0,0,0.0000,3,...,0.1714,0,0,0.0000,0,0,0.0,0,0,0.0000
Adams,Adams 1,0,0,0,5,5,0.0857,0,0,0.0000,7,...,0.0857,0,0,0.0000,0,0,0.0,0,0,0.0000
Adams,Adams 3,0,0,0,11,12,0.3429,0,0,0.0000,0,...,0.0000,0,0,0.0000,0,0,0.0,0,0,0.0000


In [7]:
total_votes_by_precinct = results.xs('Final Expression', level='round', axis=1).sum(axis=1)
print(total_votes_by_precinct.describe())
print(total_votes_by_precinct.value_counts().sort_index().head())

count    1688.000000
mean       98.938389
std       116.018511
min         0.000000
25%        25.000000
50%        59.000000
75%       124.000000
max       841.000000
dtype: float64
0     6
1     6
2     5
3    12
4    12
dtype: int64


##### As of 2/4:

This shows us 1,104 precincts with results, but 5 of those (the out-of-state and CD1 / CD2 / CD3 / CD4 satellite caucuses) have 0s for their totals, so this matches the 1,099 reported precincts.

##### 2/5 update:

Now up to 1,320 precincts, not including the 5 satellites, so that aligns with what the NY Times site shows as released.

##### 2/5 update 2:

Now up to 1,688 including satellites (turns out there's >1 per CD now). NYT shows 1,686 but they're trickling in tonight so that's fine.

#### Confirming that the data looks right

In [8]:
results.sum()

candidate    round           
Bennet       First Expression      145.0000
             Final Expression        1.0000
             SDE                     0.0000
Biden        First Expression    25391.0000
             Final Expression    22759.0000
             SDE                   327.5778
Bloomberg    First Expression      213.0000
             Final Expression       20.0000
             SDE                     0.2096
Buttigieg    First Expression    36629.0000
             Final Expression    42175.0000
             SDE                   549.0736
Delaney      First Expression        9.0000
             Final Expression        0.0000
             SDE                     0.0000
Gabbard      First Expression      326.0000
             Final Expression       15.0000
             SDE                     0.1143
Klobuchar    First Expression    21773.0000
             Final Expression    20398.0000
             SDE                   253.8542
Patrick      First Expression       49.0000
  

In [9]:
results.sum().groupby('round').sum()

round
Final Expression    167008.0000
First Expression    170784.0000
SDE                   2076.5027
dtype: float64

##### As of 2/4:

Spot-checked the SDE results against the NY Times's reported results (https://www.nytimes.com/interactive/2020/02/04/us/elections/results-iowa-caucus.html). The SDE numbers per candidate look spot on, while the total vote counts (overall and per candidate) are slightly under what the Times is reporting. Our data has 110,666 first expression votes and 107,496 final expression votes, while the Times shows 111,237 and 108,050 respectively. The extra votes shown there are less than 1% and seem fairly evenly distributed across the 5 major candidates (Biden is +88 in the Times numbers, Buttigieg +97, Klobuchar +129, etc.) so we'll run with it for now - these are preliminary results anyway so not authoritative.

##### 2/5 update:

Checked again against the NYT results, and now they match exactly, so game on.

##### 2/5 update #2:

Matches the current NYT numbers exactly.

#### Saving the data for later use

In [10]:
results.to_csv('iowa_preliminary_results_20200205.csv')

## Analysis of viability across rounds

Number of precincts each candidate received >0 votes in during the First and Final Expressions. (Having > 0 votes in the final expression means they were either viable at the time of the first or were able to meet the threshold with help from other candidates' supporters.)

In [11]:
viable_first = (results.xs('First Expression', level='round', axis=1) > 1)
viable_final = (results.xs('Final Expression', level='round', axis=1) > 1)

In [12]:
viable_first.sum()

candidate
Bennet           29
Biden          1472
Bloomberg        44
Buttigieg      1568
Delaney           1
Gabbard          84
Klobuchar      1319
Patrick           1
Sanders        1466
Steyer          563
Warren         1400
Yang            870
Other            35
Uncommitted     234
dtype: int64

In [13]:
viable_final.sum()

candidate
Bennet            0
Biden          1134
Bloomberg         4
Buttigieg      1432
Delaney           0
Gabbard           3
Klobuchar       855
Patrick           0
Sanders        1258
Steyer           66
Warren         1049
Yang            150
Other            27
Uncommitted     171
dtype: int64

In [14]:
viable_diff = viable_final.sum() - viable_first.sum()
viable_diff

candidate
Bennet         -29
Biden         -338
Bloomberg      -40
Buttigieg     -136
Delaney         -1
Gabbard        -81
Klobuchar     -464
Patrick         -1
Sanders       -208
Steyer        -497
Warren        -351
Yang          -720
Other           -8
Uncommitted    -63
dtype: int64

In [15]:
diff_pct = (100 * viable_diff / viable_first.sum()).round(1)
diff_pct.sort_values()

candidate
Bennet        -100.0
Delaney       -100.0
Patrick       -100.0
Gabbard        -96.4
Bloomberg      -90.9
Steyer         -88.3
Yang           -82.8
Klobuchar      -35.2
Uncommitted    -26.9
Warren         -25.1
Biden          -23.0
Other          -22.9
Sanders        -14.2
Buttigieg       -8.7
dtype: float64

##### 2/5 update:

As in the last analysis, the Buttigieg viability is impressive here. At the other end, Yang's numbers are as bad as they were in the previous version (83% non-viable).

##### 2/5 update:

Same.

## Analysis of switching

First, calculating shifts between rounds, then using that to calculate the number of "up for grabs" voters per candidate in precincts where the candidate was not viable (e.g., got no votes in second round) and the number of voters added in the case where candidates were viable.

In [16]:
shifts = (results.xs('Final Expression', level='round', axis=1) - results.xs('First Expression', level='round', axis=1))
shifts.head(10)

Unnamed: 0_level_0,candidate,Bennet,Biden,Bloomberg,Buttigieg,Delaney,Gabbard,Klobuchar,Patrick,Sanders,Steyer,Warren,Yang,Other,Uncommitted
County,Precinct,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Adair,2NE STUART,0,-6,0,5,0,0,1,0,0,0,0,0,0,0
Adair,1NW ADAIR,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Adair,3SW FONTANELLE,0,0,0,0,0,0,-1,0,3,-1,-4,0,0,0
Adair,4SE ORIENT,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Adair,5GF GREENFIELD,0,-8,0,2,0,0,6,0,5,0,-6,1,0,0
Adams,Adams 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Adams,Adams 5,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Adams,Adams 4,0,0,0,0,0,0,0,0,-2,0,2,0,0,0
Adams,Adams 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Adams,Adams 3,0,1,0,0,0,0,1,0,0,0,-3,0,0,0


In [17]:
gains = shifts[viable_final]
gains.head(10)

Unnamed: 0_level_0,candidate,Bennet,Biden,Bloomberg,Buttigieg,Delaney,Gabbard,Klobuchar,Patrick,Sanders,Steyer,Warren,Yang,Other,Uncommitted
County,Precinct,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Adair,2NE STUART,,,,5.0,,,1.0,,0.0,0.0,,,,
Adair,1NW ADAIR,,0.0,,0.0,,,0.0,,0.0,,0.0,,,
Adair,3SW FONTANELLE,,0.0,,,,,,,3.0,,,0.0,,
Adair,4SE ORIENT,,0.0,,0.0,,,0.0,,0.0,,0.0,,,
Adair,5GF GREENFIELD,,,,2.0,,,6.0,,5.0,,,1.0,,
Adams,Adams 2,,0.0,,0.0,,,0.0,,0.0,,0.0,,,
Adams,Adams 5,,0.0,,0.0,,,0.0,,0.0,,,,,
Adams,Adams 4,,0.0,,0.0,,,0.0,,,,2.0,,,
Adams,Adams 1,,0.0,,0.0,,,0.0,,0.0,,0.0,,,
Adams,Adams 3,,1.0,,,,,1.0,,0.0,,,,,


In [18]:
up_for_grabs = -shifts[-viable_final]
up_for_grabs.head(10)

Unnamed: 0_level_0,candidate,Bennet,Biden,Bloomberg,Buttigieg,Delaney,Gabbard,Klobuchar,Patrick,Sanders,Steyer,Warren,Yang,Other,Uncommitted
County,Precinct,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Adair,2NE STUART,-0.0,6.0,-0.0,,-0.0,-0.0,,-0.0,,,-0.0,-0.0,-0.0,-0.0
Adair,1NW ADAIR,-0.0,,-0.0,,-0.0,-0.0,,-0.0,,-0.0,,-0.0,-0.0,-0.0
Adair,3SW FONTANELLE,-0.0,,-0.0,-0.0,-0.0,-0.0,1.0,-0.0,,1.0,4.0,,-0.0,-0.0
Adair,4SE ORIENT,-0.0,,-0.0,,-0.0,-0.0,,-0.0,,-0.0,,-0.0,-0.0,-0.0
Adair,5GF GREENFIELD,-0.0,8.0,-0.0,,-0.0,-0.0,,-0.0,,-0.0,6.0,,-0.0,-0.0
Adams,Adams 2,-0.0,,-0.0,,-0.0,-0.0,,-0.0,,-0.0,,-0.0,-0.0,-0.0
Adams,Adams 5,-0.0,,-0.0,,-0.0,-0.0,,-0.0,,-0.0,-0.0,-0.0,-0.0,-0.0
Adams,Adams 4,-0.0,,-0.0,,-0.0,-0.0,,-0.0,2.0,-0.0,,-0.0,-0.0,-0.0
Adams,Adams 1,-0.0,,-0.0,,-0.0,-0.0,,-0.0,,-0.0,,-0.0,-0.0,-0.0
Adams,Adams 3,-0.0,,-0.0,-0.0,-0.0,-0.0,,-0.0,,-0.0,3.0,-0.0,-0.0,-0.0


#### Gains by candidate from switching

In [19]:
gains.describe(percentiles=[0.01,0.05,0.1]).T

Unnamed: 0_level_0,count,mean,std,min,1%,5%,10%,50%,max
candidate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Bennet,0.0,,,,,,,,
Biden,1134.0,2.025573,4.187808,-23.0,-7.0,0.0,0.0,1.0,65.0
Bloomberg,4.0,0.5,1.0,0.0,0.0,0.0,0.0,0.0,2.0
Buttigieg,1432.0,4.947626,8.32587,-61.0,0.0,0.0,0.0,2.0,108.0
Delaney,0.0,,,,,,,,
Gabbard,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Klobuchar,855.0,4.203509,7.44351,-33.0,0.0,0.0,0.0,2.0,64.0
Patrick,0.0,,,,,,,,
Sanders,1258.0,3.148649,5.740143,-24.0,-1.0,0.0,0.0,2.0,101.0
Steyer,66.0,1.121212,1.687193,-1.0,-0.35,0.0,0.0,0.0,9.0


In [20]:
gains.sum()

candidate
Bennet            0.0
Biden          2297.0
Bloomberg         2.0
Buttigieg      7085.0
Delaney           0.0
Gabbard           0.0
Klobuchar      3594.0
Patrick           0.0
Sanders        3961.0
Steyer           74.0
Warren         4936.0
Yang             41.0
Other           163.0
Uncommitted    1125.0
dtype: float64

##### As of 2/4:

Some of those results seem strange because you're not supposed to lose voters if you're viable, but it's a small enough number that maybe some people just went home or snuck off against the rules. Caucuses are messy and these things happen, so let's just go with it.

In any case, the interesting thing there is that Sanders and Biden gained fewer switchers (overall and on average across precincts) than Buttigieg, Warren, or even Klobuchar.

###### 2/5 update:

Still see similar results to last time re: switching. In fact, even in the places where Yang was viable, he lost voters. It was 3 in total, so not a substantial loss, but this suggests his precinct captains weren't doing a great job of keeping voters from going home early or shifting to other candidates after the initial allocation (even if they weren't supposed to - my guess is these rules probably weren't consistently applied).

As before, Buttigieg and Warren were the big winners from reallocation, along with Klobuchar to a lesser extent (mostly because her viability rate was lower - the average gains were comparable), while the others (especially Biden) didn't gain a whole lot of second-choice votes from supporters of non-viable candidates.

##### 2/5 update 2:
Basically the same as before, though Sanders seems to be doing a bit better at picking up reallocated voters and Klobuchar's a little worse. He got more in total, but was viable more often too so his average is still lower than hers. Yang is now net postive so that seems a bit more reasonable.

## Who switched to whom?

This is the fun part. Now that I know how many people are switching and from which candidates to whom, we can model which candidates' supporters go where. To do this, I'm going to build a linear regression for each candidate and predict the number of supporters gained as a function of the up-for-grabs voters from each non-viable candidate after the first round. The coefficient of each candidate's up-for-grabs numbers will be the estimated proportion of the non-viable candidates' voters who switch to that candidate (*when that candidate is non-viable*). There may be uncertainty in this because of the negative numbers for a handful of viable candidates' "gains", but we'll power through - it's a only a tiny fraction of the data.

I use a bounded least squares regression here, which constrains the coefficients to be between 0 and 1 (since the results wouldn't make sense otherwise, given the interpretation). This is a quick-and-dirty approach - there's probably a better way to solve this as a more general constrained optimization problem, but I don't have a better idea off the top of my head (waiting for results made for a late night last night!).

In [21]:
switches = pd.DataFrame(index=gains.columns)
for i in ['Biden', 'Buttigieg', 'Klobuchar', 'Sanders', 'Warren']:
    y = gains.loc[viable_final[i],i]
    X = up_for_grabs[viable_final[i]].fillna(0).drop(i, axis=1)
    regression = lsq_linear(X,y, (0,1))
    props = 100*pd.Series(regression['x'], index=X.columns).round(3)
    switches.loc[:,i] = props
#not showing candidates with < 200 first expression voters, because those results are just too noisy
switches = switches.drop(gains.columns[(results.sum().xs('First Expression',level='round') < 200)], axis=0)
switches

Unnamed: 0_level_0,Biden,Buttigieg,Klobuchar,Sanders,Warren
candidate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Biden,,19.4,37.0,7.6,22.6
Bloomberg,11.8,56.6,100.0,0.1,0.0
Buttigieg,17.7,,31.2,11.2,25.9
Gabbard,18.9,0.0,12.2,65.5,0.0
Klobuchar,32.4,57.1,,9.7,31.8
Sanders,9.2,20.1,6.8,,42.9
Steyer,17.0,19.7,18.5,6.6,17.2
Warren,11.9,33.6,24.2,32.8,
Yang,1.8,21.7,19.6,23.3,13.2
Uncommitted,25.1,0.0,39.0,22.3,22.7


The results here are a little tough to interpret, because what they show is *"what percentage of candidate A's voters go to candidate B when candidate A is not viable but candidate B is?"*. (Candidate A is on the y axis, candidate B on the x axis.) So for example, when Biden is not viable but Buttigieg is, Buttigieg is estimated to get 14.8% of Biden's supporters. But when Klobuchar is viable and Biden's not, she gets 44.3% of his voters. The numbers don't sum to 100 on either axis because each cell is a different scenario - that's why we can't just analyze these as "second choices" the way we would in a poll, because not all the second choices are available in every case.

##### As of 2/4:

Some of the more interesting results here, in no particular order (*note that all these are based on estimated numbers, even if I don't give that caveat every time!*):
* Biden and Sanders have very distinct voters. When Biden's not viable and Sanders is, only 9.6% of Biden's voters go to Sanders, and in the reverse scenario only 7.8% of Sanders voters go to Biden.
* Biden and Buttigieg are likewise pretty highly-seperated. When Biden isn't viable, Buttigieg gets 14.8% of them, and in the reverse case Biden gets 18.1%
* There's a fairly strong affinity between Biden voters and Klobuchar voters. When Biden isn't viable, she gets 44.3% of his voters, and in the reverse case he gets 27.9% of hers.
* Likewise Buttigieg and Klobuchar - when he's viable and she isn't, he gets 47.1% of hers, and she gets 31.2% of his when he's not viable and she is.
* On the other end of things, Klobuchar and Sanders had almost no overlap at all. When he was out and she was in, she got 2.7% of his voters, and in  the reverse case he got just 3.2%.
* When she wasn't viable, Warren's voters shifted most enthusiastically to Buttigieg (32.6%), and to a lesser extent Sanders (29.5%) and Klobuchar (21.6%) but not as much to Biden (11.7%)
* When Sanders wasn't viable, his voters went largely for Warren (she got 44.4% of his up-for-grabs voters when she was still in it), even if the reverse wasn't true.
* When Tulsi Gabbard's not viable (which was almost always), her voters seemed to go to Klobuchar when she was viable, and othersise Sanders and Warren. (Not many datapoints, though.)
* Biden didn't seem to get any traction from Yang voters, and Warren didn't do especially well with them either - Buttigieg, Klobuchar, and Sanders all got more of a boost there.
* Steyer's voters seemed to learn more toward the more moderate candidates (Biden, Buttigieg, and Klobuchar) and away from Warren and Sanders (just 8.6% went to Sanders when he was viable and Steyer wasn't, and 12% for Warren when she was viable).
* Uncommitted voters tended to break largely for Biden, and didn't show much love at all for Buttigieg or Warren.

##### 2/5 update:
* Biden and Sanders pattern still holds. Those camps don't like each other at all it seems.
* There's a little more Biden/Buttigieg switching than last time we looked (16.4% / 19.7% vs 14.8% / 18.1% from the last estimates).
* Biden and Klobuchar pattern still holds.
* The Buttigieg / Klobuchar looks even stronger. When she's not viable, he now gets almost half (49.3% of her voters, while she gets about a third (32.2%) of hers in the handful of cases where he's not viable.
* Sanders / Klobuchar crossover still looks basically nonexistant (under 5% in each direction).
* Warren's voters continue to go over to Buttigieg and Sanders most, then Klobuchar, and only rarely Biden.
* As before, Sanders voters did go over to Warren pretty often (she got 46.2% of his up-for-grabs voters when she was viable), while her voters split their votes more often and only went to him 32.6% of the time when he was viable.
* Gabbard's voters now tend to shift toward Sanders and Klobuchar in equal amounts, though with only 271 of those at this point it's not worth reading too much into. (Does make sense that they're the most outsider-y of the top five, at least if we count Klobuchar as such because she's not a "top tier" candidate.)
* Same pattern as before for Yang voters, though it's interesting that they're going with Sanders / Buttigieg / Klobuchar but not Warren or Biden. The two predominant explanations for differences in the field (ideology and gender) both cut across those groups, so neither really fits (though I suppose it could be a combination of the two, but we can't tell that since we don't know the characteristics of the individual voters).
* Steyer voters continue to lean towards moderates, though it seems that a bunch of them just go home instead (about 20%)
* Uncommitteds still largely go for Biden and don't like Buttigieg or Warren. The fact that it seems like almost no uncommitteds (there were 747 in the first count) wind up with Buttigieg is a bit surprising, but maybe his ground game there was such that they already pitched all the uncommitteds before the count and pulled over whichever ones they could?

##### 2/5 update 2:
* Bloomberg crossed the 200 voter mark so is included here for the first time. Seems his people like Buttigieg and Klobuchar, which isn't that surprising - if you like Bloomberg, you're probably a moderate who's not a big Biden fan, and those are the two clear alternatives.
* Buttigieg seems to be doing even better with Klobuchar's initial supporters than before.
* The Warren to Sanders / Sanders to Warren gap seems to have shrunk to just 10 points now, so it's not as big a story as it seemed like before.
* Gabbard's supporters seem more clearly to be falling now into the Sanders camp, which should surprise nobody.
* Uncommitteds now seem to be less Biden and more Klobuchar, but they're a relatively small group so I wouldn't read too much into that regardless.

So not a ton of change from yesterday, though there is one interesting finding I hadn't noticed yesterday: 

In [22]:
switches.sum(axis=1)

candidate
Biden           86.6
Bloomberg      168.5
Buttigieg       86.0
Gabbard         96.6
Klobuchar      131.0
Sanders         79.0
Steyer          79.0
Warren         102.5
Yang            79.6
Uncommitted    109.1
dtype: float64

This is a rough approximation of the percentage of each candidates' initial supporters who went to *any* of the top five candidates in the final count. It's an over-estimate, because it's the proportion going to each other candidate in the cases where the other candidate is viable, and that doesn't apply in all cases. That's why the least frequently viable options - Klobuchar, Gabbard, and Uncommitted - all total over 100%, since it's rare that all of the remaining 4 (in Klobuchar's case) or 5 major candidates are all viable in the final count.

But the cool thing here is that it's also an *upper-bound* estimate. That is, if (and for simplicity, we're talking now just about the top 5 candidates' initial supporters) all 4 other candidates were viable, the figures would be lower because the votes would be split across more candidates than they were in reality. So the sum of all these numbers serves as an upper bound for the percentage of each candidate's voters who reallocated to one of the other major candidates' camps.

And now for the punchline: *look at the Sanders number*. At most, this data suggests that 77% of his voters reallocated (as opposed to going to a minor candidate, becoming uncommitted, or just going home), whereas that figure is over 90% for all the other major candidates. So what this evidence tells us is that the "Bernie or bust" mindset that was a major story in 2016 seems to still be present in 2020.

Given that these are caucus results, maybe this is some kind of strategic choice (e.g., not wanting to add to the popular vote totals of another candidate, which would hurt Sanders' overall proportion of the vote share). But even if so, that's yet another reason why the caucus system is problematic - if the goal of reallocation is to produce more consensus, it fails when there's an incentive to game the popular vote in this way

##### 2/5 update 2:
That Sanders result persists, though now it's down to 21 points of drop-off, and both Buttigieg and Biden's supporters seem to drop off a bit more than before (13-14% at least). So less of a clear story there than before, but the general patttern still holds.


## So what does it all mean?

##### As of 2/4:

I'll first caution that this is a quick analysis of preliminary data, so giant mounds of salt should be taken with this. But all that said... Putting on my pundit hat, here are my big takeaways as we look ahead:
* If Klobuchar gets out after this, her voters are likely to split their support among everyone but Sanders. This could make NH more interesting, since Sanders seems to be in the lead there, but a few points of support for any of the three other major contenders would make it a virtual tie at this point.
* If we don't factor in Klobuchar or Yang, we end up with 4 candidates without a clear alignment. While some pundits are tempted to lump Biden and Buttigieg together as the moderates and Warren / Sanders as the progressives, there isn't as much overlap among their supporters as you'd expect from that simplistic view.
* More generally, Joe Biden and Bernie Sanders are very few voters' second choices. So that could put a ceiling on their potential if/when the field narrows further. Buttigieg and Warren each do fairly well with a broader set of other candidates' supporters, so they could benefit more from other candidates dropping out. (Or at the very least, this should boost their potential VP credentials if they don't pull out a win in the primary---they don't seem to have particular negatives with specific other parts of the party.)

##### 2/5 update:
* The even stronger ties between Klobuchar and Buttigieg could suggest that her actions over the next few days (and the media coverage of her campaign's viability) could particularly affect his fate in NH. If she falls off, he seems best poised to benefit, and if he can pull off an upset in NH to add to his (as of now) Iowa victory, that could have some really interesting implications for the rest of the race. (That's not to say that these two states, which are not very representative of the country as a whole or the Democratic base in particular, should have that much sway, but that's where the attention is right now, so.)
* The evidence that Sanders voters aren't going over to other candidates could just be a strategic choice from the campaign, but if it's actually a sign of an unwillingness to compromise, that could cause a lot of headaches down the road if he's not the eventual nominee. Hopefully in that scenario it wouldn't be as bad as it was in 2016, but still, it's something to be aware of as we watch these intra-party fights drag out for another month or two at least. And god help us if it's a contested convention.

##### 2/5 update 2:

Same big picture pattern as before, with almost all the results in.