Skip to content

tommasoc80/DNT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DNT

DOI

Diachronic News and Travel (DNT) corpus.

This corpus contains 279 documents, for a total of 183.517 tokens, distributed across three genres (news, travel reports, and travel guides) and two temporal periods (1862-1939 and 1998-2017).

This repository contains the raw text data divided per genre and temporal period (e.g. guide-hist: folder with travel guide from the 1862-1939 period).

The data have been enriched with manual annotation and accompanied by the development of NPL processing tools. We aim at making DNT a large multi-layer annotated corpus with different language phenomena. Feel free to contribute!

Below we link the dedicated repositories for each task:

DNT will be presented to the 10th AIUCD conference DH for society: e-quality, participation, rights and values in the Digital Age.

References

@inproceedings{caselli_sprugnoli_dnt2021, 
    title={{DNT: un Corpus Diacronico e Multigenere di Testi in Lingua Inglese}}, 
    author={Tommaso Caselli, Rachele Sprugnoli}, 
    booktitle={AIUCD2021 - Book of Abstracts, Quaderni di Umanistica Digitale.}, 
    year={2021}
} 

COMING SOON: pre-tokenized version of the data with offsets.

Creative Commons License
This work is licensed under a Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.

About

Diachronic News and Travel (DNT) corpus

Resources

License

Stars

Watchers

Forks

Packages

No packages published