# Congressional Activity
<font size=4 color='blue'>Understand and Prep Data - Demographics</font>
***

**Project Summary:**  
The Resume of Congressional Activity has been published since 1947. It includes statistics on the number of measures introduced, bills passed, the outcome of confirmations, etc.  
This project analyzes activity trends and factors that affect the productivity of Congress.  

**Notebook Scope:**  
This notebook includes code to load and preview raw demographic data from an Excel spreadsheet. Source data can be downloaded from the <a href="https://www.brookings.edu/articles/vital-statistics-on-congress/">Brookings Institute</a>.  

**Output:**  
An Excel file containing scrubbed Deomgraphic data is generated.  
***

***
# Notebook Setup
***

In [1]:
# Import libraries
import pandas as pd
import numpy as np
import re

In [2]:
# Set options
pd.options.display.multi_sparse = False
pd.options.display.max_colwidth = 25

In [3]:
%%html
<!-- Prevent text wrappping in dataframe displays for a cleaner print -->
<style> .dataframe td {white-space: nowrap;}</style>

***  
# Read and Tidy Party Data
***

In [4]:
# Read party data for the Senate
file_name = '../Data/Vital Statistics on Congress.xlsx'
senate_party_df = pd.read_excel(file_name, sheet_name='1-20', skiprows=4, header=None, usecols='A,C:G', skipfooter=3)
senate_party_df.columns = ['Congress', 'Members', 'Democrats', 'Republicans', 'Other Parties', 'Vacant']
senate_party_df.insert(loc=2, column='Chamber', value='Senate')
senate_party_df.head()

Unnamed: 0,Congress,Members,Chamber,Democrats,Republicans,Other Parties,Vacant
0,34th,62,Senate,42,15,5.0,
1,35th,64,Senate,39,20,5.0,
2,36th,66,Senate,38,26,2.0,
3,37th,50,Senate,11,31,7.0,1.0
4,38th,51,Senate,12,39,,


In [5]:
# Read party data for the House
file_name = '../Data/Vital Statistics on Congress.xlsx'
house_party_df = pd.read_excel(file_name, sheet_name='1-20', skiprows=4, header=None, usecols='A,I:M', skipfooter=3)
house_party_df.columns = ['Congress', 'Members', 'Democrats', 'Republicans', 'Other Parties', 'Vacant']
house_party_df.insert(loc=2, column='Chamber', value='House')
house_party_df.head()

Unnamed: 0,Congress,Members,Chamber,Democrats,Republicans,Other Parties,Vacant
0,34th,234,House,83,108,43.0,
1,35th,237,House,131,92,14.0,
2,36th,237,House,101,113,23.0,
3,37th,178,House,42,106,28.0,2.0
4,38th,183,House,80,103,,


In [6]:
# Consolidate Senate and House party data
party_df = pd.concat([senate_party_df, house_party_df]).reset_index(drop=True)

In [7]:
# Infer datatypes and review
party_df = party_df.convert_dtypes()
party_df.dtypes

Congress         string[python]
Members                  object
Chamber          string[python]
Democrats                object
Republicans               Int64
Other Parties             Int64
Vacant                    Int64
dtype: object

In [8]:
# All columns, except Chamber, should be int
# For string columns, remove any non-numeric characters and convert to int
party_df['Congress'] = party_df['Congress'].str.extract(r'(\d*)').astype(int)

# For object columns, remove any non-numeric characters and convert to int
obj_cols = party_df.select_dtypes(include=['object']).columns
for col in obj_cols:
    party_df[col] = party_df[col].astype('str').str.extract(r'(\d*)').astype(int)

# For Int64 columns, replace NA with zero and convert to int
int64_cols = party_df.select_dtypes(include=['Int64']).columns
party_df[int64_cols] = party_df[int64_cols].fillna(0).astype(int)

In [9]:
# Reorder and sort dataframe
party_df = party_df[['Congress', 'Chamber', 'Members', 'Vacant', 'Democrats', 'Republicans', 'Other Parties']].copy()
party_df.sort_values(by=['Congress', 'Chamber'], inplace=True)
party_df.set_index(['Congress', 'Chamber'], drop=True, inplace=True)
party_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Members,Vacant,Democrats,Republicans,Other Parties
Congress,Chamber,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
34,House,234,0,83,108,43
34,Senate,62,0,42,15,5
35,House,237,0,131,92,14
35,Senate,64,0,39,20,5
36,House,237,0,101,113,23


***  
# Read and Tidy Seniority Data
***

In [10]:
# Read seniority data for the House
file_name = '../Data/Vital Statistics on Congress.xlsx'
house_seniority_df = pd.read_excel(file_name, sheet_name='1-6', skiprows=3, usecols='A:D,F:H', skipfooter=3)
house_seniority_df.head()

Unnamed: 0,Congress,1 term,2 terms,3 terms,4 - 6 terms,7 - 9 terms,10 + terms
0,83rd (1953),,,,,,
1,Percent,18.706697,16.859122,14.7806,27.020785,13.394919,9.237875
2,Seats,81.0,73.0,64.0,117.0,58.0,40.0
3,,,,,,,
4,84th (1955),,,,,,


In [11]:
# This dataset uses four rows per observation. We'll keep only the 3rd row (raw counts), but we need to carry down the Congress data from the
# first row of each observation
for row in range(len(house_seniority_df)):
    if row == 0 or row % 4 == 0:
        congress = house_seniority_df.at[row, 'Congress']
    elif row % 4 == 2:
        house_seniority_df.at[row, 'Congress'] = congress
rows_to_drop = [x for x in range(len(house_seniority_df)) if x % 4 != 2]
house_seniority_df.drop(index=rows_to_drop, inplace=True)
house_seniority_df.reset_index(drop=True, inplace=True)
house_seniority_df.head()

Unnamed: 0,Congress,1 term,2 terms,3 terms,4 - 6 terms,7 - 9 terms,10 + terms
0,83rd (1953),81.0,73.0,64.0,117.0,58.0,40.0
1,84th (1955),57.0,73.0,63.0,119.0,73.0,50.0
2,85th (1957),46.0,50.0,66.0,142.0,66.0,63.0
3,86th (1959),82.0,45.0,49.0,136.0,64.0,57.0
4,87th (1961),62.0,65.0,36.0,131.0,76.0,67.0


In [12]:
# Add Chamber column and set to House for all rows
house_seniority_df.insert(loc=1, column='Chamber', value='House')

In [13]:
# Read seniority data for the Senate
file_name = '../Data/Vital Statistics on Congress.xlsx'
senate_seniority_df = pd.read_excel(file_name, sheet_name='1-7', skiprows=3, usecols='A,C:F', skipfooter=3)
senate_seniority_df.head()

Unnamed: 0,Congress,6 years or less,7 - 12 years,13 - 18 years,19 years or more
0,83rd,46 (16),29,14,7
1,84th,42 (14),37,8,9
2,85th,37 (10),36,13,10
3,86th,42 (20),30,14,12
4,87th,42 (7),25,22,11


In [14]:
# Add Chamber column and set to Senate for all rows
senate_seniority_df.insert(loc=1, column='Chamber', value='Senate')

In [15]:
# For our purposes, we will remove the parenthetical number under the column "6 years or less". This value indicates freshman senators
senate_seniority_df['6 years or less'] = senate_seniority_df['6 years or less'].str.extract(r'(\d*)').astype(int)

***
**Note:**  
For data consistency, seniority will be measured by terms, understanding that the years in a term vary between the House and Senate.
Since the Senate seniority data only covers 1, 2, 3, and 4 or more terms, we will adjust the House seniority data to use the same columns
***

In [16]:
# Replace all columns representing 4 or more terms with a single column in the House dataset
house_seniority_df['4+ terms'] = house_seniority_df[['4 - 6 terms', '7 - 9 terms', '10 + terms']].sum(axis=1)
house_seniority_df.drop(['4 - 6 terms', '7 - 9 terms', '10 + terms'], axis=1, inplace=True)

In [17]:
# Rename columns in the Senate dataset
senate_seniority_df.columns = ['Congress', 'Chamber', '1 term', '2 terms', '3 terms', '4+ terms']

In [18]:
# Consolidate Senate and House seniority data
seniority_df = pd.concat([senate_seniority_df, house_seniority_df]).reset_index(drop=True)

In [19]:
# Infer datatypes and review
seniority_df = seniority_df.convert_dtypes()
seniority_df.dtypes

Congress    string[python]
Chamber     string[python]
1 term               Int64
2 terms              Int64
3 terms              Int64
4+ terms             Int64
dtype: object

In [20]:
# All columns, except Chamber, should be int
# For string columns, remove any non-numeric characters and convert to int
seniority_df['Congress'] = seniority_df['Congress'].str.extract(r'(\d*)').astype(int)

# For Int64 columns, replace NA with zero and convert to int
int64_cols = seniority_df.select_dtypes(include=['Int64']).columns
seniority_df[int64_cols] = seniority_df[int64_cols].fillna(0).astype(int)

In [21]:
# Reorder and sort dataframe
seniority_df.sort_values(by=['Congress', 'Chamber'], inplace=True)
seniority_df.set_index(['Congress', 'Chamber'], drop=True, inplace=True)
seniority_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,1 term,2 terms,3 terms,4+ terms
Congress,Chamber,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
83,House,81,73,64,215
83,Senate,46,29,14,7
84,House,57,73,63,242
84,Senate,42,37,8,9
85,House,46,50,66,271


***  
# Read and Tidy Sex Data
***

In [22]:
# Read sex data. This dataset wraps, causing each row to contain two separate observations. We'll read the data in 
# two groups and merge
file_name = '../Data/Vital Statistics on Congress.xlsx'
left_df = pd.read_excel(file_name, sheet_name='1-19', skiprows=4, header=None, usecols='A,C:D,F:G', skipfooter=3)
left_df.columns = ['Congress', 'House: Dems', 'House: Repubs', 'Senate: Dems', 'Senate: Repubs']
right_df = pd.read_excel(file_name, sheet_name='1-19', skiprows=4, header=None, usecols='I,K:L,N:O', skipfooter=3)
right_df.columns = ['Congress', 'House: Dems', 'House: Repubs', 'Senate: Dems', 'Senate: Repubs']
sex_df = pd.concat([left_df, right_df]).dropna(axis=0, how='all')
sex_df.reset_index(drop=True, inplace=True)
sex_df.head()

Unnamed: 0,Congress,House: Dems,House: Repubs,Senate: Dems,Senate: Repubs
0,65th,,1.0,,
1,66th,,,,
2,67th,,2.0,,1.0
3,68th,,1.0,,
4,69th,1.0,2.0,,


In [23]:
# Split each row by Chamber, then concat
house_sex_df = sex_df[['Congress', 'House: Dems', 'House: Repubs']].copy()
house_sex_df.insert(loc=1, column='Chamber', value='House')
house_sex_df.columns = ['Congress', 'Chamber', 'Dems', 'Repubs']
senate_sex_df = sex_df[['Congress', 'Senate: Dems', 'Senate: Repubs']].copy()
senate_sex_df.insert(loc=1, column='Chamber', value='Senate')
senate_sex_df.columns = ['Congress', 'Chamber', 'Dems', 'Repubs']
sex_df = pd.concat([house_sex_df, senate_sex_df])

In [24]:
# Infer datatypes and review
sex_df = sex_df.convert_dtypes()
sex_df.dtypes

Congress    string[python]
Chamber     string[python]
Dems                 Int64
Repubs               Int64
dtype: object

In [25]:
# All columns, except Chamber, should be int
# For string columns, remove any non-numeric characters and convert to int
sex_df['Congress'] = sex_df['Congress'].str.extract(r'(\d*)').astype(int)

# For Int64 columns, replace NA with zero and convert to int
int64_cols = sex_df.select_dtypes(include=['Int64']).columns
sex_df[int64_cols] = sex_df[int64_cols].fillna(0).astype(int)

In [26]:
# Combine the Dems and Repubs values to find a total number of women, then delete the Dems and Repubs columns
sex_df['Women'] = sex_df['Dems'] + sex_df['Repubs']
sex_df.drop(columns=['Dems', 'Repubs'], inplace=True)

In [27]:
# Reorder and sort dataframe
sex_df.sort_values(by=['Congress', 'Chamber'], inplace=True)
sex_df.set_index(['Congress', 'Chamber'], drop=True, inplace=True)
sex_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Women
Congress,Chamber,Unnamed: 2_level_1
65,House,1
65,Senate,0
66,House,0
66,Senate,0
67,House,2


***
# Read and Tidy Race Data
***

## African American
***

In [28]:
# Read Asian American data. This dataset wraps, causing each row to contain two separate observations. We'll read the data 
# in two groups and merge
file_name = '../Data/Vital Statistics on Congress.xlsx'
left_df = pd.read_excel(file_name, sheet_name='1-16', skiprows=4, header=None, usecols='A,C:D,F:G')
left_df.columns = ['Congress', 'House: Dems', 'House: Repubs', 'Senate: Dems', 'Senate: Repubs']
right_df = pd.read_excel(file_name, sheet_name='1-16', skiprows=4, header=None, usecols='I,K:L,N:O')
right_df.columns = ['Congress', 'House: Dems', 'House: Repubs', 'Senate: Dems', 'Senate: Repubs']
african_americans_df = pd.concat([left_df, right_df]).dropna(axis=0, how='all')
african_americans_df.reset_index(drop=True, inplace=True)
african_americans_df.head()

Unnamed: 0,Congress,House: Dems,House: Repubs,Senate: Dems,Senate: Repubs
0,41st,,2.0,,1.0
1,42nd,,5.0,,
2,43rd,,7.0,,
3,44th,,7.0,,1.0
4,45th,,3.0,,1.0


In [29]:
# Split each row by Chamber, then concat
house_african_americans_df = african_americans_df[['Congress', 'House: Dems', 'House: Repubs']].copy()
house_african_americans_df.insert(loc=1, column='Chamber', value='House')
house_african_americans_df.columns = ['Congress', 'Chamber', 'Dems', 'Repubs']
senate_african_americans_df = african_americans_df[['Congress', 'Senate: Dems', 'Senate: Repubs']].copy()
senate_african_americans_df.insert(loc=1, column='Chamber', value='Senate')
senate_african_americans_df.columns = ['Congress', 'Chamber', 'Dems', 'Repubs']
african_americans_df = pd.concat([house_african_americans_df, senate_african_americans_df])

In [30]:
# Infer datatypes and review
african_americans_df = african_americans_df.convert_dtypes()
african_americans_df.dtypes

Congress    string[python]
Chamber     string[python]
Dems                 Int64
Repubs               Int64
dtype: object

In [31]:
# All columns, except Chamber, should be int
# For string columns, remove any non-numeric characters and convert to int
african_americans_df['Congress'] = african_americans_df['Congress'].str.extract(r'(\d*)').astype(int)

# For Int64 columns, replace NA with zero and convert to int
int64_cols = african_americans_df.select_dtypes(include=['Int64']).columns
african_americans_df[int64_cols] = african_americans_df[int64_cols].fillna(0).astype(int)

In [32]:
# Combine the Dems and Repubs values to find a total number of women, then delete the Dems and Repubs columns
african_americans_df['African Americans'] = african_americans_df['Dems'] + african_americans_df['Repubs']
african_americans_df.drop(columns=['Dems', 'Repubs'], inplace=True)

In [33]:
# Reorder and sort dataframe
african_americans_df.sort_values(by=['Congress', 'Chamber'], inplace=True)
african_americans_df.set_index(['Congress', 'Chamber'], drop=True, inplace=True)
african_americans_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,African Americans
Congress,Chamber,Unnamed: 2_level_1
41,House,2
41,Senate,1
42,House,5
42,Senate,0
43,House,7


***
## Asian American
***

In [34]:
# Read African American data. This dataset wraps, causing each row to contain two separate observations. We'll read the data 
# in two groups and merge
file_name = '../Data/Vital Statistics on Congress.xlsx'
left_df = pd.read_excel(file_name, sheet_name='1-17', skiprows=4, header=None, usecols='A,C:D,F:G')
left_df.columns = ['Congress', 'House: Dems', 'House: Repubs', 'Senate: Dems', 'Senate: Repubs']
right_df = pd.read_excel(file_name, sheet_name='1-17', skiprows=4, header=None, usecols='I,K:L,N:O')
right_df.columns = ['Congress', 'House: Dems', 'House: Repubs', 'Senate: Dems', 'Senate: Repubs']
asian_americans_df = pd.concat([left_df, right_df]).dropna(axis=0, how='all')
asian_americans_df.reset_index(drop=True, inplace=True)
asian_americans_df.head()

Unnamed: 0,Congress,House: Dems,House: Repubs,Senate: Dems,Senate: Repubs
0,58th,,,,
1,59th,,,,
2,60th,,,,
3,61st,,,,
4,62nd,,,,


In [35]:
# Split each row by Chamber, then concat
house_asian_americans_df = asian_americans_df[['Congress', 'House: Dems', 'House: Repubs']].copy()
house_asian_americans_df.insert(loc=1, column='Chamber', value='House')
house_asian_americans_df.columns = ['Congress', 'Chamber', 'Dems', 'Repubs']
senate_asian_americans_df = asian_americans_df[['Congress', 'Senate: Dems', 'Senate: Repubs']].copy()
senate_asian_americans_df.insert(loc=1, column='Chamber', value='Senate')
senate_asian_americans_df.columns = ['Congress', 'Chamber', 'Dems', 'Repubs']
asian_americans_df = pd.concat([house_asian_americans_df, senate_asian_americans_df])

In [36]:
# Infer datatypes and review
asian_americans_df = asian_americans_df.convert_dtypes()
asian_americans_df.dtypes

Congress    string[python]
Chamber     string[python]
Dems                 Int64
Repubs               Int64
dtype: object

In [37]:
# All columns, except Chamber, should be int
# For string columns, remove any non-numeric characters and convert to int
asian_americans_df['Congress'] = asian_americans_df['Congress'].str.extract(r'(\d*)').astype(int)

# For Int64 columns, replace NA with zero and convert to int
int64_cols = asian_americans_df.select_dtypes(include=['Int64']).columns
asian_americans_df[int64_cols] = asian_americans_df[int64_cols].fillna(0).astype(int)

In [38]:
# Combine the Dems and Repubs values to find a total number of women, then delete the Dems and Repubs columns
asian_americans_df['Asian Americans'] = asian_americans_df['Dems'] + asian_americans_df['Repubs']
asian_americans_df.drop(columns=['Dems', 'Repubs'], inplace=True)

In [39]:
# Reorder and sort dataframe
asian_americans_df.sort_values(by=['Congress', 'Chamber'], inplace=True)
asian_americans_df.set_index(['Congress', 'Chamber'], drop=True, inplace=True)
asian_americans_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Asian Americans
Congress,Chamber,Unnamed: 2_level_1
58,House,0
58,Senate,0
59,House,0
59,Senate,0
60,House,0


***
## Hispanic
***

In [40]:
# Read Hispanic American data. This dataset wraps, causing each row to contain two separate observations. We'll read the data 
# in two groups and merge
file_name = '../Data/Vital Statistics on Congress.xlsx'
left_df = pd.read_excel(file_name, sheet_name='1-18', skiprows=5, header=None, usecols='A,C:D,F:G')
left_df.columns = ['Congress', 'House: Dems', 'House: Repubs', 'Senate: Dems', 'Senate: Repubs']
right_df = pd.read_excel(file_name, sheet_name='1-18', skiprows=5, header=None, usecols='I,K:L,N:O')
right_df.columns = ['Congress', 'House: Dems', 'House: Repubs', 'Senate: Dems', 'Senate: Repubs']
hispanic_americans_df = pd.concat([left_df, right_df]).dropna(axis=0, how='all')
hispanic_americans_df.reset_index(drop=True, inplace=True)
hispanic_americans_df.head()

Unnamed: 0,Congress,House: Dems,House: Repubs,Senate: Dems,Senate: Repubs
0,41st,,,,
1,42nd,,,,
2,43rd,,,,
3,63rd,1.0,,,
4,64th,1.0,1.0,,


In [41]:
# Split each row by Chamber, then concat
house_hispanic_americans_df = hispanic_americans_df[['Congress', 'House: Dems', 'House: Repubs']].copy()
house_hispanic_americans_df.insert(loc=1, column='Chamber', value='House')
house_hispanic_americans_df.columns = ['Congress', 'Chamber', 'Dems', 'Repubs']
senate_hispanic_americans_df = hispanic_americans_df[['Congress', 'Senate: Dems', 'Senate: Repubs']].copy()
senate_hispanic_americans_df.insert(loc=1, column='Chamber', value='Senate')
senate_hispanic_americans_df.columns = ['Congress', 'Chamber', 'Dems', 'Repubs']
hispanic_americans_df = pd.concat([house_hispanic_americans_df, senate_hispanic_americans_df])

In [42]:
# Infer datatypes and review
hispanic_americans_df = hispanic_americans_df.convert_dtypes()
hispanic_americans_df.dtypes

Congress    string[python]
Chamber     string[python]
Dems                 Int64
Repubs               Int64
dtype: object

In [43]:
# All columns, except Chamber, should be int
# For string columns, remove any non-numeric characters and convert to int
hispanic_americans_df['Congress'] = hispanic_americans_df['Congress'].str.extract(r'(\d*)').astype(int)

# For Int64 columns, replace NA with zero and convert to int
int64_cols = hispanic_americans_df.select_dtypes(include=['Int64']).columns
hispanic_americans_df[int64_cols] = hispanic_americans_df[int64_cols].fillna(0).astype(int)

In [44]:
# Combine the Dems and Repubs values to find a total number of women, then delete the Dems and Repubs columns
hispanic_americans_df['Hispanic Americans'] = hispanic_americans_df['Dems'] + hispanic_americans_df['Repubs']
hispanic_americans_df.drop(columns=['Dems', 'Repubs'], inplace=True)

In [45]:
# Reorder and sort dataframe
hispanic_americans_df.sort_values(by=['Congress', 'Chamber'], inplace=True)
hispanic_americans_df.set_index(['Congress', 'Chamber'], drop=True, inplace=True)
hispanic_americans_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Hispanic Americans
Congress,Chamber,Unnamed: 2_level_1
41,House,0
41,Senate,0
42,House,0
42,Senate,0
43,House,0


***
## Consolidate Race Data
***

In [46]:
# Concatenate race-related dataframes
race_df = pd.concat([african_americans_df, asian_americans_df, hispanic_americans_df], axis=1).fillna(0)
race_df = race_df.astype(int)
race_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,African Americans,Asian Americans,Hispanic Americans
Congress,Chamber,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
41,House,2,0,0
41,Senate,1,0,0
42,House,5,0,0
42,Senate,0,0,0
43,House,7,0,0


***
# Read and Tidy Occupation Data
***

In [47]:
# Read occupation data for the House
file_name = '../Data/Vital Statistics on Congress.xlsx'
house_occupations_df = pd.read_excel(file_name, sheet_name='1-8', skiprows=2, skipfooter=4)
house_occupations_df.head()

Unnamed: 0,Occupation,83rd 1953,84tha 1955,86th 1959,89th 1965,90th 1967,91st 1969,92nda 1971,93rd 1973,94th 1975,...,108th 2003,109th 2005,110th 2007,111th 2009,112th 2011,113th 2013,114th 2015,115th\n2017,116th\n2019,117th\n2021
0,Acting/entertainer,,,,,,,,,,...,2.0,3.0,3.0,3.0,2.0,1.0,1.0,1.0,1.0,0.0
1,Aeronautics,,,,,,,,,,...,2.0,2.0,2.0,,2.0,1.0,2.0,3.0,2.0,3.0
2,Agriculture,63.0,60.0,45.0,44.0,39.0,35.0,36.0,39.0,30.0,...,26.0,29.0,23.0,27.0,25.0,27.0,27.0,23.0,22.0,24.0
3,Business or banking,149.0,141.0,130.0,156.0,161.0,158.0,149.0,153.0,141.0,...,165.0,205.0,166.0,225.0,187.0,184.0,173.0,168.0,165.0,168.0
4,Clergy,,,,3.0,3.0,2.0,2.0,4.0,5.0,...,2.0,3.0,3.0,1.0,3.0,6.0,7.0,7.0,7.0,6.0


In [48]:
# Transpose Data
house_occupations_df = house_occupations_df.set_index('Occupation', drop=True).transpose()
house_occupations_df.reset_index(inplace=True, names='Congress')
house_occupations_df = house_occupations_df.rename_axis(None, axis=1)
house_occupations_df.head()

Unnamed: 0,Congress,Acting/entertainer,Aeronautics,Agriculture,Business or banking,Clergy,Congressional aide,Education,Engineering,Journalism,...,Real estate,Veteran,New Occupations,Artistic/Creative,Healthcare,Homemaker/Domestic,Science,Secreterial/clerical,Technical/Trade,Miscellaneous
0,83rd 1953,,,63.0,149.0,,23.0,65.0,,39.0,...,,268.0,,,,,,,,
1,84tha 1955,,,60.0,141.0,,22.0,60.0,,36.0,...,,277.0,,,,,,,,
2,86th 1959,,,45.0,130.0,,25.0,41.0,3.0,35.0,...,,281.0,,,,,,,,
3,89th 1965,,,44.0,156.0,3.0,28.0,68.0,9.0,43.0,...,,316.0,,,,,,,,
4,90th 1967,,,39.0,161.0,3.0,26.0,57.0,6.0,39.0,...,,327.0,,,,,,,,


In [49]:
# Add Chamber column and set to House
house_occupations_df.insert(loc=1, column='Chamber', value='House')

In [50]:
# Read occupation data for the Senate
file_name = '../Data/Vital Statistics on Congress.xlsx'
senate_occupations_df = pd.read_excel(file_name, sheet_name='1-11', skiprows=2, skipfooter=4)
senate_occupations_df.head()

Unnamed: 0,Occupation,83rd 1953,84tha 1955,86th 1959,89th 1965,90th 1967,91st 1969,92nda 1971,93rd 1973,94th 1975,...,108th 2003,109th 2005,110th 2007,111th 2009,112th 2011,113th 2013,114th 2015,115th\n2017,116th\n2019,117thb\n2021
0,Acting/entertainer,,,,,,,,,,...,0.0,0.0,1.0,1.0,3.0,2.0,2.0,2.0,1.0,1.0
1,Aeronautics,,,,,,,,,,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2,Agriculture,21.0,19.0,17.0,18.0,18.0,16.0,11.0,11.0,10.0,...,5.0,5.0,6.0,4.0,5.0,5.0,5.0,5.0,6.0,7.0
3,Business or banking,29.0,27.0,28.0,25.0,23.0,25.0,24.0,22.0,22.0,...,25.0,40.0,27.0,36.0,29.0,25.0,31.0,29.0,30.0,31.0
4,Clergy,,,,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,2.0


In [51]:
# Transpose Data
senate_occupations_df = senate_occupations_df.set_index('Occupation', drop=True).transpose()
senate_occupations_df.reset_index(inplace=True, names='Congress')
senate_occupations_df = senate_occupations_df.rename_axis(None, axis=1)
senate_occupations_df.head()

Unnamed: 0,Congress,Acting/entertainer,Aeronautics,Agriculture,Business or banking,Clergy,Congressional aide,Education,Engineering,Journalism,...,Real estate,Veteran,New Occupations,Artistic/Creative,Healthcare,Homemaker/Domestic,Science,Secretarial/Clerical,Technical/Trade,Miscellaneous
0,83rd 1953,,,21.0,29.0,,5.0,14.0,,10.0,...,,71.0,,,,,,,,
1,84tha 1955,,,19.0,27.0,,5.0,14.0,,11.0,...,,71.0,,,,,,,,
2,86th 1959,,,17.0,28.0,,6.0,16.0,2.0,13.0,...,,70.0,,,,,,,,
3,89th 1965,,,18.0,25.0,0.0,4.0,16.0,2.0,10.0,...,,71.0,,,,,,,,
4,90th 1967,,,18.0,23.0,0.0,5.0,15.0,2.0,10.0,...,,72.0,,,,,,,,


In [52]:
# Add Chamber column and set to House
senate_occupations_df.insert(loc=1, column='Chamber', value='Senate')

In [53]:
# Merge House and Senate data
occupations_df = pd.concat([house_occupations_df, senate_occupations_df])
occupations_df

Unnamed: 0,Congress,Chamber,Acting/entertainer,Aeronautics,Agriculture,Business or banking,Clergy,Congressional aide,Education,Engineering,...,Veteran,New Occupations,Artistic/Creative,Healthcare,Homemaker/Domestic,Science,Secreterial/clerical,Technical/Trade,Miscellaneous,Secretarial/Clerical
0,83rd 1953,House,,,63.0,149.0,,23.0,65.0,,...,268.0,,,,,,,,,
1,84tha 1955,House,,,60.0,141.0,,22.0,60.0,,...,277.0,,,,,,,,,
2,86th 1959,House,,,45.0,130.0,,25.0,41.0,3.0,...,281.0,,,,,,,,,
3,89th 1965,House,,,44.0,156.0,3.0,28.0,68.0,9.0,...,316.0,,,,,,,,,
4,90th 1967,House,,,39.0,161.0,3.0,26.0,57.0,6.0,...,327.0,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27,113th 2013,Senate,2.0,0.0,5.0,25.0,0.0,9.0,17.0,0.0,...,18.0,,2.0,0.0,5.0,0.0,,1.0,1.0,1.0
28,114th 2015,Senate,2.0,0.0,5.0,31.0,1.0,14.0,18.0,1.0,...,20.0,,3.0,0.0,5.0,0.0,,3.0,1.0,2.0
29,115th\n2017,Senate,2.0,0.0,5.0,29.0,1.0,13.0,18.0,1.0,...,20.0,,3.0,0.0,5.0,0.0,,3.0,1.0,2.0
30,116th\n2019,Senate,1.0,0.0,6.0,30.0,1.0,13.0,21.0,1.0,...,18.0,,2.0,1.0,4.0,0.0,,4.0,1.0,2.0


In [54]:
# Infer datatypes and review
occupations_df = occupations_df.convert_dtypes()
occupations_df.dtypes

Congress                   string[python]
Chamber                    string[python]
Acting/entertainer                  Int64
Aeronautics                         Int64
Agriculture                         Int64
Business or banking                 Int64
Clergy                              Int64
Congressional aide                  Int64
Education                           Int64
Engineering                         Int64
Journalism                          Int64
Labor leader                        Int64
Law                                 Int64
Law enforcement                     Int64
Medicine                            Int64
Military                            Int64
Professional sports                 Int64
Public service/politics             Int64
Real estate                         Int64
Veteran                             Int64
New Occupations                     Int64
Artistic/Creative                   Int64
Healthcare                          Int64
Homemaker/Domestic                

In [55]:
# All columns, except Chamber, should be int
# For string columns, remove any non-numeric characters and convert to int
occupations_df['Congress'] = occupations_df['Congress'].str.extract(r'(\d*)').astype(int)

# For Int64 columns, replace NA with zero and convert to int
int64_cols = occupations_df.select_dtypes(include=['Int64']).columns
occupations_df[int64_cols] = occupations_df[int64_cols].fillna(0).astype(int)

In [56]:
# Reorder and sort dataframe
occupations_df.sort_values(by=['Congress', 'Chamber'], inplace=True)
occupations_df.set_index(['Congress', 'Chamber'], drop=True, inplace=True)
occupations_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Acting/entertainer,Aeronautics,Agriculture,Business or banking,Clergy,Congressional aide,Education,Engineering,Journalism,Labor leader,...,Veteran,New Occupations,Artistic/Creative,Healthcare,Homemaker/Domestic,Science,Secreterial/clerical,Technical/Trade,Miscellaneous,Secretarial/Clerical
Congress,Chamber,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
83,House,0,0,63,149,0,23,65,0,39,0,...,268,0,0,0,0,0,0,0,0,0
83,Senate,0,0,21,29,0,5,14,0,10,0,...,71,0,0,0,0,0,0,0,0,0
84,House,0,0,60,141,0,22,60,0,36,0,...,277,0,0,0,0,0,0,0,0,0
84,Senate,0,0,19,27,0,5,14,0,11,0,...,71,0,0,0,0,0,0,0,0,0
86,House,0,0,45,130,0,25,41,3,35,0,...,281,0,0,0,0,0,0,0,0,0


In [57]:
# Review column names
occupations_df.columns

Index(['Acting/entertainer', 'Aeronautics', 'Agriculture',
       'Business or banking', 'Clergy', 'Congressional aide', 'Education',
       'Engineering', 'Journalism', 'Labor leader', 'Law', 'Law enforcement',
       'Medicine', 'Military', 'Professional sports',
       'Public service/politics', 'Real estate', 'Veteran', 'New Occupations',
       'Artistic/Creative', 'Healthcare', 'Homemaker/Domestic', 'Science',
       'Secreterial/clerical', 'Technical/Trade', 'Miscellaneous',
       'Secretarial/Clerical'],
      dtype='object')

In [58]:
# Drop New Occupations - this is a header from the datafile
occupations_df.drop(columns='New Occupations', inplace=True)

In [59]:
# Merge the two Secretarial columns
occupations_df['Secretarial or Clerical'] = occupations_df['Secreterial/clerical'] + occupations_df['Secretarial/Clerical']
occupations_df.drop(columns=['Secreterial/clerical', 'Secretarial/Clerical'], inplace=True)

In [60]:
# Clean up column names for consistency
occupations_df.columns = ['Actor or Entertainer', 'Aeronautics', 'Agriculture', 'Business or Banking', 'Clergy', 'Congressional Aide', 
                          'Education', 'Engineering', 'Journalism', 'Labor Leader', 'Law', 'Law Enforcement', 'Medicine', 'Military', 
                          'Professional Sports', 'Public Service or Politics', 'Real Estate', 'Veteran', 'Artistic or Creative', 
                          'Healthcare', 'Homemaker or Domestic', 'Science', 'Technical or Trade', 'Miscellaneous', 'Secretarial or Clerical']

In [61]:
occupations_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Actor or Entertainer,Aeronautics,Agriculture,Business or Banking,Clergy,Congressional Aide,Education,Engineering,Journalism,Labor Leader,...,Public Service or Politics,Real Estate,Veteran,Artistic or Creative,Healthcare,Homemaker or Domestic,Science,Technical or Trade,Miscellaneous,Secretarial or Clerical
Congress,Chamber,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
83,House,0,0,63,149,0,23,65,0,39,0,...,361,0,268,0,0,0,0,0,0,0
83,Senate,0,0,21,29,0,5,14,0,10,0,...,89,0,71,0,0,0,0,0,0,0
84,House,0,0,60,141,0,22,60,0,36,0,...,349,0,277,0,0,0,0,0,0,0
84,Senate,0,0,19,27,0,5,14,0,11,0,...,87,0,71,0,0,0,0,0,0,0
86,House,0,0,45,130,0,25,41,3,35,0,...,384,0,281,0,0,0,0,0,0,0


***
# Read and Tidy Religion Data
***

In [62]:
# Read religion data for the House
file_name = '../Data/Vital Statistics on Congress.xlsx'
house_religion_df = pd.read_excel(file_name, sheet_name='1-14', skiprows=2, header=[0,1], skipfooter=4)
house_religion_df.head()

Unnamed: 0_level_0,Unnamed: 0_level_0,89th (1965),89th (1965),89th (1965),89th (1965),90th (1967),90th (1967),90th (1967),90th (1967),91st (1969),...,115th (2017),115th (2017),115th (2017),116th (2019),116th (2019),116th (2019),116th (2019),117th (2021),117th (2021),117th (2021)
Unnamed: 0_level_1,Unnamed: 0_level_1.1,D,R,Total,Total.1,D,R,Total,Total.1,D,...,R,Total,Total.1,D,R,Total,Total.1,D,R,Total
0,Catholic,81.0,13.0,94.0,,73.0,22.0,95.0,,75.0,...,71.0,145.0,,87.0,53.0,140.0,,77.0,55.0,132.0
1,,,,,,,,,,,...,,,,,,,,,,
2,Jewish,14.0,1.0,15.0,,14.0,2.0,16.0,,14.0,...,2.0,23.0,,25.0,2.0,27.0,,24.0,2.0,26.0
3,,,,,,,,,,,...,,,,,,,,,,
4,Protestant,,,,,,,,,,...,,,,,,,,,,


In [63]:
# Delete any rows or columns that contain only NaN
house_religion_df.dropna(axis=0, how='all', inplace=True)
house_religion_df.dropna(axis=1, how='all', inplace=True)

In [64]:
# Transpose dataframe
house_religion_df = house_religion_df.transpose().reset_index()
house_religion_df.head()

Unnamed: 0,level_0,level_1,0,2,4,5,6,7,8,9,10,11,13,15
0,Unnamed: 0_level_0,Unnamed: 0_level_1,Catholic,Jewish,Protestant,Baptist,Episcopalian,Methodist,Presbyterian,Mormon,Lutheran,Protestant- other,All otherh,Total
1,89th (1965),D,81.0,14.0,,33.0,29.0,46.0,30.0,,,,62.0,295.0
2,89th (1965),R,13.0,1.0,,9.0,25.0,23.0,26.0,,,,43.0,140.0
3,89th (1965),Total,94.0,15.0,,42.0,54.0,69.0,56.0,,,,105.0,435.0
4,90th (1967),D,73.0,14.0,,30.0,25.0,36.0,26.0,,,,42.0,249.0


In [65]:
# Delete unneeded columns and rows
rows_to_del = house_religion_df[(house_religion_df['level_1'] == 'D') | (house_religion_df['level_1'] == 'R')].index
house_religion_df.drop(index=rows_to_del, inplace=True)
house_religion_df.drop(columns=[15, 'level_1'], inplace=True)

In [66]:
# Set the column headings to the first row of data
house_religion_df.columns = house_religion_df.iloc[0]
house_religion_df.drop(index=0, inplace=True)
house_religion_df.head()

Unnamed: 0,Unnamed: 0_level_0,Catholic,Jewish,Protestant,Baptist,Episcopalian,Methodist,Presbyterian,Mormon,Lutheran,Protestant- other,All otherh
3,89th (1965),94.0,15.0,,42.0,54.0,69.0,56.0,,,,105.0
6,90th (1967),95.0,16.0,,42.0,50.0,68.0,63.0,,,,99.0
9,91st (1969),97.0,16.0,,43.0,50.0,67.0,62.0,,,,97.0
12,92nd (1971),101.0,12.0,,42.0,49.0,65.0,67.0,,,,98.0
15,93rd (1973),99.0,12.0,,45.0,50.0,63.0,60.0,,13.0,15.0,76.0


In [67]:
# Add Chamber column and set to House
house_religion_df.insert(loc=1, column='Chamber', value='House')

In [68]:
# Read religion data for the Senate
file_name = '../Data/Vital Statistics on Congress.xlsx'
senate_religion_df = pd.read_excel(file_name, sheet_name='1-15', skiprows=2, header=[0,1], skipfooter=4)
senate_religion_df.head()

Unnamed: 0_level_0,Unnamed: 0_level_0,89th (1965),89th (1965),89th (1965),89th (1965),90th (1967),90th (1967),90th (1967),90th (1967),91st (1969),...,115th (2017),115th (2017),115th (2017),116th (2019),116th (2019),116th (2019),116th (2019),117th (2021)h,117th (2021)h,117th (2021)h
Unnamed: 0_level_1,Unnamed: 0_level_1.1,D,R,Total,Total.1,D,R,Total,Total.1,D,...,R,Total,Total.1,D,R,Total,Total.1,D,R,Total
0,Catholic,12.0,2.0,14.0,,11.0,2.0,13.0,,10.0,...,9.0,24.0,,12,10.0,22.0,,14,10.0,24.0
1,,,,,,,,,,,...,,,,,,,,,,
2,Jewish,1.0,1.0,2.0,,1.0,1.0,2.0,,1.0,...,0.0,8.0,,8f,0.0,8.0,,9f,0.0,9.0
3,,,,,,,,,,,...,,,,,,,,,,
4,Protestant,,,,,,,,,,...,,,,,,,,,,


In [69]:
# Delete any rows or columns that contain only NaN
senate_religion_df.dropna(axis=0, how='all', inplace=True)
senate_religion_df.dropna(axis=1, how='all', inplace=True)

In [70]:
# Transpose dataframe
senate_religion_df = senate_religion_df.transpose().reset_index()
senate_religion_df.head()

Unnamed: 0,level_0,level_1,0,2,4,5,6,7,8,9,10,11,13,15
0,Unnamed: 0_level_0,Unnamed: 0_level_1,Catholic,Jewish,Protestant,Baptist,Episcopalian,Methodist,Presbyterian,Mormon,Lutheran,Protestant- other,All other,Total
1,89th (1965),D,12.0,1.0,,9.0,8.0,15.0,8.0,,,,15.0,68.0
2,89th (1965),R,2.0,1.0,,3.0,7.0,7.0,3.0,,,,9.0,32.0
3,89th (1965),Total,14.0,2.0,,12.0,15.0,22.0,11.0,,,,24.0,100.0
4,90th (1967),D,11.0,1.0,,7.0,8.0,15.0,8.0,,,,14.0,64.0


In [71]:
# Delete unneeded columns and rows
rows_to_del = senate_religion_df[(senate_religion_df['level_1'] == 'D') | (senate_religion_df['level_1'] == 'R')].index
senate_religion_df.drop(index=rows_to_del, inplace=True)
senate_religion_df.drop(columns=[15, 'level_1'], inplace=True)

In [72]:
senate_religion_df.head()

Unnamed: 0,level_0,0,2,4,5,6,7,8,9,10,11,13
0,Unnamed: 0_level_0,Catholic,Jewish,Protestant,Baptist,Episcopalian,Methodist,Presbyterian,Mormon,Lutheran,Protestant- other,All other
3,89th (1965),14.0,2.0,,12.0,15.0,22.0,11.0,,,,24.0
6,90th (1967),13.0,2.0,,11.0,15.0,23.0,12.0,,,,24.0
9,91st (1969),13.0,2.0,,9.0,15.0,22.0,14.0,,,,25.0
12,92nd (1971),12.0,2.0,,8.0,17.0,20.0,16.0,,,,25.0


In [73]:
# Set the column headings to the first row of data
senate_religion_df.columns = senate_religion_df.iloc[0]
senate_religion_df.drop(index=0, inplace=True)
senate_religion_df.head()

Unnamed: 0,Unnamed: 0_level_0,Catholic,Jewish,Protestant,Baptist,Episcopalian,Methodist,Presbyterian,Mormon,Lutheran,Protestant- other,All other
3,89th (1965),14.0,2.0,,12.0,15.0,22.0,11.0,,,,24.0
6,90th (1967),13.0,2.0,,11.0,15.0,23.0,12.0,,,,24.0
9,91st (1969),13.0,2.0,,9.0,15.0,22.0,14.0,,,,25.0
12,92nd (1971),12.0,2.0,,8.0,17.0,20.0,16.0,,,,25.0
15,93rd (1973),14.0,2.0,,7.0,17.0,17.0,15.0,,3.0,3.0,22.0


In [74]:
# Add Chamber column and set to Senate
senate_religion_df.insert(loc=1, column='Chamber', value='Senate')

In [75]:
# Compare column headings and clear up any discrepancies
print(house_religion_df.columns)
print(senate_religion_df.columns)

Index(['Unnamed: 0_level_0', 'Chamber', 'Catholic', 'Jewish', 'Protestant',
       '   Baptist', '   Episcopalian', '   Methodist', '   Presbyterian',
       '   Mormon', '   Lutheran', '   Protestant- other', 'All otherh'],
      dtype='object', name=0)
Index(['Unnamed: 0_level_0', 'Chamber', 'Catholic', 'Jewish', 'Protestant',
       '   Baptist', '   Episcopalian', '   Methodist', '   Presbyterian',
       '   Mormon', '   Lutheran', '   Protestant- other', 'All other'],
      dtype='object', name=0)


In [76]:
# Remove footnote from house dataframe All other column
house_religion_df.rename(columns={'All otherh': 'All other'}, inplace=True)

In [77]:
# Strip leading spaces and rename the Congress column
house_religion_df.columns = house_religion_df.columns.str.strip()
house_religion_df.rename(columns={'Unnamed: 0_level_0': 'Congress'}, inplace=True)
senate_religion_df.columns = senate_religion_df.columns.str.strip()
senate_religion_df.rename(columns={'Unnamed: 0_level_0': 'Congress'}, inplace=True)

In [78]:
# Merge House and Senate data
religion_df = pd.concat([house_religion_df, senate_religion_df]).reset_index(drop=True)
religion_df.head()

Unnamed: 0,Congress,Chamber,Catholic,Jewish,Protestant,Baptist,Episcopalian,Methodist,Presbyterian,Mormon,Lutheran,Protestant- other,All other
0,89th (1965),House,94.0,15.0,,42.0,54.0,69.0,56.0,,,,105.0
1,90th (1967),House,95.0,16.0,,42.0,50.0,68.0,63.0,,,,99.0
2,91st (1969),House,97.0,16.0,,43.0,50.0,67.0,62.0,,,,97.0
3,92nd (1971),House,101.0,12.0,,42.0,49.0,65.0,67.0,,,,98.0
4,93rd (1973),House,99.0,12.0,,45.0,50.0,63.0,60.0,,13.0,15.0,76.0


In [79]:
# Infer datatypes and review
religion_df = religion_df.convert_dtypes()
religion_df.dtypes

0
Congress             string[python]
Chamber              string[python]
Catholic                      Int64
Jewish                        Int64
Protestant                    Int64
Baptist                      object
Episcopalian                  Int64
Methodist                     Int64
Presbyterian                  Int64
Mormon                        Int64
Lutheran                      Int64
Protestant- other             Int64
All other                     Int64
dtype: object

In [80]:
# All columns, except Chamber, should be int
# For string columns, remove any non-numeric characters and convert to int
religion_df['Congress'] = religion_df['Congress'].str.extract(r'(\d*)').astype(int)

# For object columns, remove any non-numeric characters and convert to int
religion_df['Baptist'] = religion_df['Baptist'].fillna(0).astype('str').str.extract(r'(\d*)').astype(int)

# For Int64 columns, replace NA with zero and convert to int
int64_cols = religion_df.select_dtypes(include=['Int64']).columns
religion_df[int64_cols] = religion_df[int64_cols].fillna(0).astype(int)

In [81]:
# Reorder and sort dataframe
religion_df.sort_values(by=['Congress', 'Chamber'], inplace=True)
religion_df.set_index(['Congress', 'Chamber'], drop=True, inplace=True)
religion_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Catholic,Jewish,Protestant,Baptist,Episcopalian,Methodist,Presbyterian,Mormon,Lutheran,Protestant- other,All other
Congress,Chamber,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
89,House,94,15,0,42,54,69,56,0,0,0,105
89,Senate,14,2,0,12,15,22,11,0,0,0,24
90,House,95,16,0,42,50,68,63,0,0,0,99
90,Senate,13,2,0,11,15,23,12,0,0,0,24
91,House,97,16,0,43,50,67,62,0,0,0,97


***
# Consolidate Data
***

In [82]:
# Since the party dataframe has the largest range of data in terms of Congresses, that will be our primary dataframe for joins
demographics_df = party_df.join([seniority_df, sex_df, race_df, occupations_df, religion_df])

In [83]:
# Fill na and convert to int
demographics_df = demographics_df.fillna(0).astype('int')
demographics_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Members,Vacant,Democrats,Republicans,Other Parties,1 term,2 terms,3 terms,4+ terms,Women,...,Jewish,Protestant,Baptist,Episcopalian,Methodist,Presbyterian,Mormon,Lutheran,Protestant- other,All other
Congress,Chamber,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
34,House,234,0,83,108,43,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
34,Senate,62,0,42,15,5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
35,House,237,0,131,92,14,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
35,Senate,64,0,39,20,5,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
36,House,237,0,101,113,23,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


***
# Write to Excel
***

In [84]:
demographics_df.reset_index(inplace=True)
demographics_df.to_excel('../Data/Demographics Data - Scrubbed.xlsx', index=False)

***
**End**
***