Cognate Sets Need Review #37

LinguList · 2021-07-11T16:13:05Z

I just inspected a couple of cognate sets in the data out of interest and found that there are often cases where I think one should make a more fine-grained distinction rather than lumping entries into one cognate set.

Specifically clear is this when looking at the alignments, even if one does not really align the data.

LinguList · 2021-07-11T16:14:20Z

In this case, it looks very implausible to assume that k corresponds to m which almost never really occurs. Moreover, the form in Karo suffers from a misstokenization, as mʔ is obviously one sound.

LinguList · 2021-07-11T16:16:18Z

We can pull out implausible sound correspondences, if you want to, and it seems useful to do so, as this would probably also enhance the quality of the data. By now, quick eyeballing reveals numerous cases where it is hard to defend that the words are cognate, unless you provide me with some really deep insights into morphological patterns.

However: morphological patterns should not be aligned anyway, so if you know that kam is similar to nam for some morphological process, they would still be two different cognate sets, and you'd add one root identifiers (ROOTID) that unifies them.

LanguageStructure · 2021-07-11T16:19:54Z

Hi Mattis, thanks for the remark. Certainly it would be useful. I am right now re-checking some Mawe and Aweti words that should be cognates with TG but are assigned a different classes. My list is already quite big. This However: morphological patterns should not be aligned anyway, so if you know that kam is similar to nam for some morphological process, they would still be two different cognate sets, and you'd add one *root identifiers* ( ROOTID) that unifies them. is not the case!

…

On Sun, Jul 11, 2021 at 6:16 PM Johann-Mattis List ***@***.***> wrote: We can pull out implausible sound correspondences, if you want to, and it seems useful to do so, as this would probably also enhance the quality of the data. By now, quick eyeballing reveals numerous cases where it is hard to defend that the words are cognate, unless you provide me with some really deep insights into morphological patterns. However: morphological patterns should not be aligned anyway, so if you know that kam is similar to nam for some morphological process, they would still be two different cognate sets, and you'd add one *root identifiers* (ROOTID) that unifies them. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#37 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALNZPK3HZEEPFIRFDKGC7HLTXG7V3ANCNFSM5AFNXM3A> .

LanguageStructure · 2021-07-11T16:31:45Z

this is a difficult case. There is indeed no correspondence k - n, but in thi case, since the k is found in hte TG languages and the n in the non TG languages, the vowel and the labial consonante are too big a coincidence. This is why they are marked with the same cognates. I have just improved the two languages that were clearly wrong

…

On Sun, Jul 11, 2021 at 6:14 PM Johann-Mattis List ***@***.***> wrote: In this case, it looks very implausible to assume that k corresponds to m which almost never really occurs. Moreover, the form in Karo suffers from a misstokenization, as mʔ is obviously one sound. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#37 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALNZPK6564O5YPIW3EUZIZ3TXG7ONANCNFSM5AFNXM3A> .

LanguageStructure · 2021-07-11T16:45:30Z

@LinguList cognac on edictor has been improved compared to the cognates you see in the published version of TuLeD. Besides improved cognacy, we also have more concepts, languages, more partial cognates

LinguList · 2021-07-11T18:04:43Z

If there is no regular correspondence between k and n, I'd not label them cognate. What IS possible, is a story in which you say: the original form is kam, but sound symbolism then makes it nam, as we know that words denoting "breast" can be involved in sound symbolism. But it would need that additional explanation, and ideally a two-level cognate set distinction: a root-identifier, that unifies this, and the normal cogid that reflects if the words exhibit regular sound correspondences.

LinguList · 2021-07-11T18:05:38Z

What is probably the best idea is to run the sound correspondence pattern algorithm to chek how many good correspondence patterns we find. I can try and do this experiment in the next week.

LanguageStructure · 2021-07-11T18:26:41Z

is this also in LingPy? If yes, I could do it If not and you could do it, then please use the data on EDICTOR; since the cognacy is better than in the published version.

…

On Sun, Jul 11, 2021 at 8:05 PM Johann-Mattis List ***@***.***> wrote: What is probably the best idea is to run the sound correspondence pattern algorithm to chek how many good correspondence patterns we find. I can try and do this experiment in the next week. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#37 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALNZPK2HMJ6MVQB7G5YVDVTTXHMP5ANCNFSM5AFNXM3A> .

LinguList · 2021-07-11T18:28:46Z

Please check this paper: http://doi.org/10.1162/coli_a_00344 We recently re-did the code in lingpy/lingrex. Application is in principle not difficult, but you need some good error-checking before, so I should best integrate it into the workflow, and since lingrex now has good test coverage, we can add it to the tuled/tular workflow.

LanguageStructure · 2021-07-11T18:31:57Z

I am going to read the paper tomorrow. That would be great!

…

On Sun, Jul 11, 2021 at 8:28 PM Johann-Mattis List ***@***.***> wrote: Please check this paper: http://doi.org/10.1162/coli_a_00344 We recently re-did the code in lingpy/lingrex. Application is in principle not difficult, but you need some good error-checking before, so I should best integrate it into the workflow, and since lingrex now has good test coverage, we can add it to the tuled/tular workflow. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#37 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALNZPK467RT2HTP52ODVMLLTXHPGTANCNFSM5AFNXM3A> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cognate Sets Need Review #37

Cognate Sets Need Review #37

LinguList commented Jul 11, 2021

LinguList commented Jul 11, 2021

LinguList commented Jul 11, 2021

LanguageStructure commented Jul 11, 2021 via email

LanguageStructure commented Jul 11, 2021 via email

LanguageStructure commented Jul 11, 2021

LinguList commented Jul 11, 2021 via email

LinguList commented Jul 11, 2021 via email

LanguageStructure commented Jul 11, 2021 via email

LinguList commented Jul 11, 2021 via email

LanguageStructure commented Jul 11, 2021 via email

Cognate Sets Need Review #37

Cognate Sets Need Review #37

Comments

LinguList commented Jul 11, 2021

LinguList commented Jul 11, 2021

LinguList commented Jul 11, 2021

LanguageStructure commented Jul 11, 2021 via email

LanguageStructure commented Jul 11, 2021 via email

LanguageStructure commented Jul 11, 2021

LinguList commented Jul 11, 2021 via email

LinguList commented Jul 11, 2021 via email

LanguageStructure commented Jul 11, 2021 via email

LinguList commented Jul 11, 2021 via email

LanguageStructure commented Jul 11, 2021 via email