Skip to content
A corpus of German language travelogues from the period 1500-1876
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
16th_century
17th_century
18th_century
.gitattributes
.gitignore
CHANGELOG.md
LICENSE.md
README.md

README.md

Travelogues Corpus

A corpus of German language travelogues from the period 1500-1876, drawn from the Austrian Books Online project of the Austrian National Library. The corpus was compiled by the domain experts of the Travelogues Project, using the library's administration system (ALMA). Full-texts and manifests with metadata were retrieved using the SACHA infrastructure. Texts are the result of Optical Character Recognition (OCR), and were not manually corrected. Travelogues is funded through grant I 3795 of the Austrian Science Fund (FWF), and grant 398697847 of the German Research Foundation (DFG).


Repository Contents

- 16th_century
  |- 16c-books.zip (14 MB, 66 files)
  |- 16c-metadata.zip (68 KB, 66 files)
- 17th_century
  |- 17c-books.zip (49 MB, 204 files)
  |- 17c-metadata.zip (202 KB, 204 files)
- 18th_century
  |- 18c-books.zip (214 MB, 949 files)
  |- 18c-metadata.zip (814 KB, 949 files)

IMPORTANT! Git LFS must be installed on your system in order to clone this repository correctly.


Accessing Digital Objects Online

Book and metadata files are named according to their barcode identifiers in the Austrian National Library. The permanent URLs to the digital objects can be constructed by prefixing the barcode with http://data.onb.ac.at/ABO/+, e.g. for barcode Z180627808: http://data.onb.ac.at/ABO/+Z180627808.


Use of the Corpus for Machine Learning

This corpus was used to train an automatic classifier in this publication:

Jan Rörden, Doris Gruber, Martin Krickl, Bernhard Haslhofer (2019) Identifying Historical Travelogues in Large Text Corpora Using Machine Learning (accepted for publication), arXiv:2001.01673 [cs.DL]

More information and source code is available in this repository: Travelogues/identifying-travelogues.


License

You can’t perform that action at this time.