Linguistic and stylistic complexity measures for (literary) texts
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Linguistic and Stylistic Complexity

This project is a collection of measures that assess the linguistic and stylistic complexity of (literary) texts.

Vocabulary-based complexity measures

Measures that use sample size and vocabulary size

  • Type-token ratio
  • Guiraud's R
  • Herdan's C
  • Dugast's k
  • Maas' a2
  • Dugast's U
  • Tuldava's LN
  • Brunet's W
  • Carroll's CTTR
  • Summer's S

Measures that use part of the frequency spectrum

  • Honoré's H
  • Sichel's S
  • Michéa's M

Measures that use the whole frequency spectrum

  • Entropy
  • Yule's K
  • Simpson's D
  • Herdan's Vm
  • McCarthy and Jarvis' HD-D

Parameters of probabilistic models

  • Orlov's Z

Measures that use the whole text

  • Covington and McFall's MATTR
  • MTLD
  • Kubat and Milicka's STTR

Syntactic complexity measures

  • Average sentence length
  • Average dependency distance
  • Average closeness centrality
  • Average outdegree centralization
  • Average closeness centralization
  • Average longest shortest path
  • Average dependents per token
  • Average punctuation per sentence
  • Average punctuation per token