The Katukinan-Arawan-Apirunã-Harakmbut Database comprises lexical data from 18 doculects spanning different language families with uncertain genetic links. This comprehensive database includes manually assigned simple and partial cognates, colexifications, and valuable notes, all organized in accordance with the standardized CLDF format for easy sharing and accessibility. It covers a broad spectrum of concepts, encompassing the renowned Swadesh List, culturally significant items, as well as various species of fauna and flora.
If you use these data please cite
CLDF dataset accomapnying Gerardi et al.'s "Lexical Database of the Arawan language family" from 2022
This dataset is licensed under a CC-BY-4.0 license
Conceptlists in Concepticon:
- Varieties: 18
- Concepts: 678
- Lexemes: 5,935
- Sources: 36
- Synonymy: 1.11
- Cognacy: 6,909 cognates in 2,470 cognate sets (1,319 singletons)
- Cognate Diversity: 0.34
- Invalid lexemes: 0
- Tokens: 30,445
- Segments: 145 (0 BIPA errors, 0 CLTS sound class errors, 145 CLTS modified)
- Inventory size (avg): 39.83
- Entries missing sources: 407/5935 (6.86%)
Name | GitHub user | Description | Role |
---|---|---|---|
Fabrício Ferraz Gerardi | @LanguageStructure | Data Collector,cognacy assignment, co-lexifications, notes | Author |
Carolina Aragon | @carolinaaragon | Data Collector,cognacy assignment, co-lexifications, notes | Author |
Fernando Orphão de Carvalho | @fernaoorphao | Data Collector,cognacy assignment, co-lexifications, notes | Author |
Stanislav Reichert | @StasReichert | Data Collector | Author |
Alan Vogel | Data Collector,cognacy assignment, co-lexifications, notes | Author | |
An Van linden | Data Collector,cognacy assignment, co-lexifications, notes | Author | |
Johann-Mattis List | @lingulist | EDICTOR set up and final checks | Other |
The following CLDF datasets are available in cldf:
- CLDF Wordlist at cldf/cldf-metadata.json