A parser for the Congressional Record.
HTML Python
Switch branches/tags
Nothing to show
Clone or download
Latest commit 7f9a090 Sep 29, 2017
Permalink
Failed to load latest commit information.
congressionalrecord
tests Skip reading Pgnull files Sep 28, 2017
.gitignore
.travis.yml add python 35 and 36 to travis.yml Sep 27, 2017
CONTRIBUTING.md Create CONTRIBUTING.md Aug 14, 2017
LICENSE
README.md Update README.md Sep 2, 2017
requirements.txt
run_tests.py modified test suite and properly added user-agent Sep 2, 2017
setup.py

README.md

Build Status

congressional-record

This tool converts HTML files containing the text of the Congressional Record into structured text data. It is particularly useful for identifying speeches by members of Congress.

From the repository root, type python -m congressionalrecord.cli -h for instructions.

  • It outputs JSON
  • Instances of speech are tagged with the speaker's bioguideid wherever possible
  • Instances of speech are recorded as "turns," such that each subsequent instance of speech by a Member counts as a new "turn."

This software is released as-is under the BSD3 License, with no warranty of any kind.

Recommended citation:

Judd, Nicholas, Dan Drinkard, Jeremy Carbaugh, and Lindsay Young. congressional-record: A parser for the Congressional Record. Chicago, IL: 2017.