Skip to content

That’s cool. Computational sociolinguistic methods for investigating individual lexico-grammatical variation

License

Notifications You must be signed in to change notification settings

wuqui/IndVarBNC

Repository files navigation

That's cool. Computational sociolinguistic methods for investigating individual lexico-grammatical variation

Script for data retrieval and processing

Hans-Jörg Schmid (1),
Quirin Würschinger (1),
Sebastian Fischer (2),
Helmut Küchenhoff (2)

(1) Department of English and American Studies, LMU Munich, Germany
(2) Department of Statistics, LMU Munich, Germany

Functionality

  • This notebook parses the XML version of BNC2014,
  • calculates total counts for texts, speakers and words in the corpus,
  • performs queries for the target pattern that's ADJ and stores all hits,
  • merges hits with semantic category descriptions from the USAS tagset,
  • merges hits with metadata for speakers and conversations from the spreadsheets provided by BNC2014.

Contents

  • The code is provided as a notebook with comments in IndVarBNC.ipynb.
  • Exported versions of the notebook for viewing can be found in IndVarBNC.html and IndVarBNC.pdf.
  • Output files are stored in the directory out/.

Correspondence

If you want to adapt and use the script just contact us via email.

About

That’s cool. Computational sociolinguistic methods for investigating individual lexico-grammatical variation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published