# Cluster Visualization Calculations

This notebook takes precalculated cluster membership for the US house of representatives and uses it to do the following:
- Determine the size of each cluster
- Determine the "polarity" (ratio of democrat and republican membership) of a given cluster

In [2]:
import pandas as pd

In [3]:
cluster_data_fp = "../../data/results/q1_party_distribution.csv"
cluster_df = pd.read_csv(cluster_data_fp)

This file contains the cluster membership for each individual house member for each combination of topic and subject in the top 5 overall subjects for legislation and their associated topics. The amount of topics for a given subject may vary.

In [4]:
cluster_df.head()

Unnamed: 0,topic,subtopic,cluster_id,cluster_count,D,I,ID,R
0,Government operations and politics,Government operations and politics,0,212,1,0,0,165
1,Government operations and politics,Government operations and politics,1,85,36,2,0,43
2,Government operations and politics,Government operations and politics,2,219,204,0,0,0
3,Government operations and politics,Government information and archives,0,234,234,0,0,1
4,Government operations and politics,Government information and archives,1,197,0,2,0,197


In [5]:
party_colors = ["#092573", "#250973", "#500973", "#730950", "#8f0303"]

In [6]:
cluster_df["total_members"] = cluster_df["D"] + cluster_df["I"] + cluster_df["ID"] + cluster_df["R"]

In [7]:
cluster_df.head()

Unnamed: 0,topic,subtopic,cluster_id,cluster_count,D,I,ID,R,total_members
0,Government operations and politics,Government operations and politics,0,212,1,0,0,165,166
1,Government operations and politics,Government operations and politics,1,85,36,2,0,43,81
2,Government operations and politics,Government operations and politics,2,219,204,0,0,0,204
3,Government operations and politics,Government information and archives,0,234,234,0,0,1,235
4,Government operations and politics,Government information and archives,1,197,0,2,0,197,199


In [16]:
colors = []

for row in cluster_df.itertuples():
    frac_rep = row[8]/row[9]
    # Retrieve the color from a set list of colors that go from bluest (most democratic) to reddest (most republican)
    color_ind = int(round(frac_rep * (len(party_colors) - 1))) # We have to subtract 1 because Python uses 0 indexing
    color = party_colors[color_ind]
    colors.append(color)

cluster_df["color"] = colors

In [17]:
cluster_df

Unnamed: 0,topic,subtopic,cluster_id,cluster_count,D,I,ID,R,total_members,color
0,Government operations and politics,Government operations and politics,0,212,1,0,0,165,166,#8f0303
1,Government operations and politics,Government operations and politics,1,85,36,2,0,43,81,#500973
2,Government operations and politics,Government operations and politics,2,219,204,0,0,0,204,#092573
3,Government operations and politics,Government information and archives,0,234,234,0,0,1,235,#092573
4,Government operations and politics,Government information and archives,1,197,0,2,0,197,199,#8f0303
...,...,...,...,...,...,...,...,...,...,...
145,Health,Government information and archives,1,86,9,0,0,9,18,#500973
146,Health,Government information and archives,2,233,232,0,0,2,234,#092573
147,Health,Government studies and investigations,0,89,10,0,0,11,21,#500973
148,Health,Government studies and investigations,1,197,2,2,0,195,199,#8f0303


In [18]:
cluster_df.to_csv("viz_clusters.csv")