Skip to content

tupian-language-resources/kahd

Repository files navigation

Katukinan-Arawan-Harakmbut Database (KAAHD)

The Katukinan-Arawan-Apirunã-Harakmbut Database comprises lexical data from 18 doculects spanning different language families with uncertain genetic links. This comprehensive database includes manually assigned simple and partial cognates, colexifications, and valuable notes, all organized in accordance with the standardized CLDF format for easy sharing and accessibility. It covers a broad spectrum of concepts, encompassing the renowned Swadesh List, culturally significant items, as well as various species of fauna and flora.

How to cite

If you use these data please cite

DOI

Description

CLDF dataset accomapnying Gerardi et al.'s "Lexical Database of the Arawan language family" from 2022

This dataset is licensed under a CC-BY-4.0 license

Conceptlists in Concepticon:

Statistics

Glottolog: 92% Concepticon: 89% Source: 93% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 18
  • Concepts: 678
  • Lexemes: 5,935
  • Sources: 36
  • Synonymy: 1.11
  • Cognacy: 6,909 cognates in 2,470 cognate sets (1,319 singletons)
  • Cognate Diversity: 0.34
  • Invalid lexemes: 0
  • Tokens: 30,445
  • Segments: 145 (0 BIPA errors, 0 CLTS sound class errors, 145 CLTS modified)
  • Inventory size (avg): 39.83

Possible Improvements:

  • Entries missing sources: 407/5935 (6.86%)

Contributors

Name GitHub user Description Role
Fabrício Ferraz Gerardi @LanguageStructure Data Collector,cognacy assignment, co-lexifications, notes Author
Carolina Aragon @carolinaaragon Data Collector,cognacy assignment, co-lexifications, notes Author
Fernando Orphão de Carvalho @fernaoorphao Data Collector,cognacy assignment, co-lexifications, notes Author
Stanislav Reichert @StasReichert Data Collector Author
Alan Vogel Data Collector,cognacy assignment, co-lexifications, notes Author
An Van linden Data Collector,cognacy assignment, co-lexifications, notes Author
Johann-Mattis List @lingulist EDICTOR set up and final checks Other

CLDF Datasets

The following CLDF datasets are available in cldf: