That's cool. Computational sociolinguistic methods for investigating individual lexico-grammatical variation
Hans-Jörg Schmid (1),
Quirin Würschinger (1),
Sebastian Fischer (2),
Helmut Küchenhoff (2)
(1) Department of English and American Studies, LMU Munich, Germany
(2) Department of Statistics, LMU Munich, Germany
- This notebook parses the XML version of BNC2014,
- calculates total counts for texts, speakers and words in the corpus,
- performs queries for the target pattern
that's ADJ
and stores all hits, - merges hits with semantic category descriptions from the USAS tagset,
- merges hits with metadata for speakers and conversations from the spreadsheets provided by BNC2014.
- The code is provided as a notebook with comments in
IndVarBNC.ipynb
. - Exported versions of the notebook for viewing can be found in
IndVarBNC.html
andIndVarBNC.pdf
. - Output files are stored in the directory
out/
.
- regarding the paper: hans-joerg.schmid@lmu.de
- regarding this repository: q.wuerschinger@lmu.de
If you want to adapt and use the script just contact us via email.