Exploring Hansard Transcripts
The script so far:
- Fetches and cleans all hansard transcripts from 2017.
- Gets the inverse document frequency for prominent words excluding stopwords with TfidfVectorizer.
- Visualises the distribution and show a few documents sorted by words that define it most.