Skip to content

valvoda/holjplus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HOLJ Plus

This project extracts UK House of Lords judgements from 1996 to 2009: https://publications.parliament.uk/pa/ld/ldjudgmt.htm

HTML files are scraped for the text of the cases and cleaned up for the purposes of annotating the majority judgement. We select 231 of those cases and merge them with the HOLJ corpus to create the 300 cases strong HOLJ+ corpus.

Getting Started

To get the full corpus used in our research, simply run "holjplus.py", this should get you ~750 House of Lords judgements in plain text format - HOLJ+. To get the 300 cases strong corpus we use for majority opinion research we then merge the existing HOLJ corpus with the HOLJ+ corpus using "merge.py".

"merge.py" can also be used to further extend and combine our, or any .txt corpus. See "merge.py" for details.

Prerequisites

Running the tests

To run the build in tests, run format.py, extract.py and scrape.py

Contributing

scrape.py - functions adapted from realpython.com tutorial

Authors

  • Josef Valvoda

License

This project is licensed under the MIT License - LICENSE.md

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages