# Accessing ProPublica's Congressional API

ProPublica provides API access to legislative data from the House, Senate and Library of Congress. This notebook attempts to explain the major functionality needed for our group project, but for more information you can check __[ProPublica's API documentation](https://projects.propublica.org/api-docs/congress-api/)__. Note that you'll need to get an API Key in order to access the information; users with an API key are restricted to 5000 requests per day. You can sign up at ProPublica's __[ProPublica's Data Store](https://www.propublica.org/datastore/api/propublica-congress-api)__.

The Congressional API includes:
- Roll-call vote data (1991 onward for House; 1989 onward for Senate)
- Member data
- Bill data (since 1995)
- Floor actions
- Committee data (including committee membership)
- Personal explanation (related to missed votes)
- Nomination data (2001 onward)
- Other information

For our purposes we are most interested in the roll-call vote, member, bill and committee data. This document will include a section explaining how to access each of these API endpoints and a description of the underlying data structure returned.

## Base URL and Error Overview

The ProPublica API uses a common base URL for its different API endpoits. The common URL is: <font color = blue>https://api.propublica.org/congress/v1/</font>. In each of the sections below, the additional parameters needed to search a given API endpoint are described. 

The API also uses the following error codes that are universal across the API endpoints and can be used to check whether a request failed and determine the reason.

- 400 : Bad request (improperly formed)
- 403 : Forbidden (request doesn't have authorization header)
- 404 : Not found (specified record can't be found)
- 406 : Not acceptable (you requested a format that is not JSON or XML)
- 500 : Internal server error (ProPublica had server issue; try again later)
- 503 : Service unavailable (service is currently down; try again later)


## Congressional Members 

To start acquiring data on Congressional members it is first useful to retrieve a list of all the members. This is accomplished by adding the Congressional session and chamber to the base API URL and requesting that members.json be returned. 

**The format is: Base URL + {congress}/{chamber}/members.json**

For example, the base URL with <font color=green>115/senate/members.json</font> would request the list of senators in the 115th Congress, while <font color=green>115/house/members.json</font> would request members of the House of Representatives from the 115th Congress. 

The code below illustrates how one could return this information easily via Python. However, in practice it would be necessary to loop through multiple Congressional sessions and also retrieve data for the House and Senate. Since legislator's can serve in multiple sessions and have different roles in those sessions, we'll need to look for potential duplicates in the returned data we plan to use for our project. 

There is also an API endpoint to get additional informatino for a specific member. This includes additional information on the member's roles on committees. But it seems like this might be more easily pulled via the committee API. Other than the committee information, it doesn't seem like we'll need to worry about the individual member query too much.

In [52]:
import itertools
import requests
import numpy as np
import pandas as pd
import json

outPath = "C:/Users/rkuhn/Documents/Courses/DataandVisualAnalytics/Project/allMemberData.csv"
apiKey = "AwB4zaxyUCsrdIPV2K9S863GD8rUMm98ZRjJaEGC"
baseUrl = "https://api.propublica.org/congress/v1/"

# Populate required API options
firstCongress = 102
lastCongress = 115
congress = [str(c) for c in range(firstCongress, lastCongress+1)]
chamber = ["house", "senate"]
endPoint = "members.json"

# Make request for each congress and chamber combination using list comprehension
# This essentially gets all the member data -- although we have to clean it up some
apiRequestList = [requests.get(baseUrl+combo[0]+"/"+combo[1]+"/"+endPoint, headers = {'X-API-Key': apiKey})
                  for combo in itertools.product(congress, chamber)]

members = []

# For each request we'll check the status and then use results to extend list of member JSON
for i,resp in enumerate(apiRequestList):
    respJSON = resp.json()
    if respJSON['status'] == 'OK':
        congress = respJSON['results'][0]['congress']
        chamber = respJSON['results'][0]['chamber']
        members.extend( [dict(member, congress = congress, chamber = chamber)
                           for member in respJSON['results'][0]['members'] ] )
        
allMemberDF = pd.DataFrame(members)

print(allMemberDF.dtypes)
display(allMemberDF.head())
allMemberDF.to_csv(outPath)
## The at_large, district, and geoid columns are present only for house members
## The Senate has lis_id, senate_class and state_rank in addition to house columns

api_uri                  object
at_large                 object
chamber                  object
congress                 object
contact_form             object
crp_id                   object
cspan_id                 object
date_of_birth            object
district                 object
dw_nominate             float64
facebook_account         object
fax                      object
fec_candidate_id         object
first_name               object
gender                   object
geoid                    object
google_entity_id         object
govtrack_id              object
icpsr_id                 object
id                       object
ideal_point             float64
in_office                  bool
last_name                object
last_updated             object
leadership_role          object
lis_id                   object
middle_name              object
missed_votes            float64
missed_votes_pct        float64
next_election            object
ocd_id                   object
office  

Unnamed: 0,api_uri,at_large,chamber,congress,contact_form,crp_id,cspan_id,date_of_birth,district,dw_nominate,...,state_rank,suffix,title,total_present,total_votes,twitter_account,url,votes_with_party_pct,votesmart_id,youtube_account
0,https://api.propublica.org/congress/v1/members...,False,House,102,,,,1938-06-26,1,,...,,,Representative,0.0,932.0,neilabercrombie,,87.03,,hawaiirep1
1,https://api.propublica.org/congress/v1/members...,False,House,102,,,1002061.0,1942-11-19,7,,...,,,Representative,0.0,932.0,repgaryackerman,,89.86,,RepAckerman
2,https://api.propublica.org/congress/v1/members...,False,House,102,,,,1934-01-16,1,,...,,,Representative,0.0,932.0,,,86.29,,
3,https://api.propublica.org/congress/v1/members...,False,House,102,,,,1943-12-02,4,,...,,,Representative,0.0,932.0,,,82.21,,
4,https://api.propublica.org/congress/v1/members...,False,House,102,,,,1952-03-08,7,,...,,,Representative,1.0,545.0,,,83.55,,


Of course there needs to be cleanup of some of the data. One item of particular interest to us will be the ability to connect the member data with the OpenSecrets campaign finance data. The ProPublica Congressional API provides the fec_candidate_id, which should enable linking to the OpenSecrets field that contains the same information.

In [62]:
print('There are', allMemberDF[allMemberDF.fec_candidate_id == ''].shape[0], 'members without FEC Candidate Ids!')
print('There are', allMemberDF[allMemberDF.fec_candidate_id != ''].shape[0], 'members with FEC Candidate Ids!')
## We see that 6,636 records have missing FEC_Candidate_Id numbers!
## 1,103 candidates have FEC_Candidate_Id numbers. We'll have to try to match things better

# But everyone has a GovTrack Id! 
print('==========================================================')
allMemberDF.loc[allMemberDF.govtrack_id.isna(), "govtrack_id"].head()
allMemberDF.loc[(allMemberDF.last_name == 'Chiesa') & (allMemberDF.first_name == 'Jeffrey'), "govtrack_id"] = "412597"
allMemberDF.loc[(allMemberDF.last_name == 'Jones') & (allMemberDF.first_name == 'Brenda'), "govtrack_id"] = "412752"
print('There are', allMemberDF[allMemberDF.govtrack_id == ''].shape[0], 'members without GovTrack Ids!')
print('There are', allMemberDF[allMemberDF.govtrack_id != ''].shape[0], 'members with GovTrack Ids!')

# Lucky for us we can join in another dataset to get a legislator's open secret id looked up from their govtrack one!
usAPIBaseUrl = ["https://theunitedstates.io/congress-legislators/"]
usAPIEndpoints = ["legislators-current.json", "legislators-historical.json"]
# We can read in some JSON info from https://theunitedstates.io/congress-legislators/legislators-current.json
usAPILeg = [requests.get(combo[0]+combo[1]) for combo in itertools.product(usAPIBaseUrl, usAPIEndpoints)]
usAPIMembers = [member for request in usAPILeg for member in request.json() ]

usAPIMemberDF = pd.DataFrame(usAPIMembers)

MemberIdDF = usAPIMemberDF.id.apply(pd.Series)
MemberIdDF = MemberIdDF.loc[:,['govtrack', 'opensecrets']].rename(columns = {'govtrack':'govtrack_id'})
#MemberIdDF['joinCol'] = MemberIdDF.govtrack_id.astype('object')
#print(allMemberDF.govtrack_id.dtype)
#print(MemberIdDF.govtrack_id.dtype)
#allMemberDF['joinCol'] = allMemberDF.govtrack_id.astype('int64')
#allMemberDF.join(MemberIdDF.drop({'govtrack_id'}, axis = 1), on = 'joinCol', how = 'left')

# We can verify that we have open secret ids for everyone now
print('==========================================================')
#print('There are', allMemberDF[allMemberDF.opensecrets == ''].shape[0], 'members without open secret Ids!')
#print('There are', allMemberDF[allMemberDF.opensecrets != ''].shape[0], 'members with open secret Ids!')

There are 6636 members without FEC Candidate Ids!
There are 1103 members with FEC Candidate Ids!
There are 0 members without GovTrack Ids!
There are 7739 members with GovTrack Ids!


In [61]:
## I want this merge to work -- but I keep getting key type mismatches!!
allMemberDF.loc[:, 'govtrack_id2'] = allMemberDF.govtrack_id.astype(int)
MemberIdDF.loc[:, 'govtrack_id2'] = MemberIdDF.govtrack_id.astype(int)
print(allMemberDF.govtrack_id2.dtype)
print(MemberIdDF.govtrack_id2.dtype)
test = pd.merge(allMemberDF, MemberIdDF[MemberIdDF.govtrack_id2.notna()], on = 'govtrack_id2', how = 'left')
display(test[test.govtrack_id == 400895].head())
#print('There are', test[test.opensecrets == ''].shape[0], 'members without open secret Ids!')
#print('There are', test[test.opensecrets != ''].shape[0], 'members with open secret Ids!')
#print(MemberIdDF.loc[MemberIdDF.govtrack_id == 400895].head())

int32
int32


AttributeError: 'DataFrame' object has no attribute 'govtrack_id'

## Bill Information

This section of the notebook will explain the information related to the legislative bill data. 