-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Derived causatives #9
Comments
Typically when two feature values apply to the same feature key, UD uses a comma to separate them, so since passive and causative diathesis can exist concurrently in many languages (e.g. a passive causative verb meaning "I was made to eat it" or similar), I would have expected:
This is similar to combined values for gender as described here for Fem,Masc. I think layered features are mainly used when there are two different underlying things being annotated, e.g. a possessor gender and a possessed gender. Here if they differ, something like Fem,Masc would be wrong, becuase the word being annotated doesn't really have both genders simultaneously. But for a passive causative verb, I think it is truly simultaneously passive and causative, so the comma notation should apply. |
Many thanks for the clarification! I have overlooked this option, because I erroneously thought that such notation implies a range of values, associated with a morphological category, only one of which is valid in a given context, e.g. |
The comma annotation is however (at least officially) intended as "either one". So I would understand So, cases of application for this notation would be, for Latin (not implemented yet):
(PS: this link seems to be broken) This case in Classical Armenian seems to vouch for the separation of Active/Passive from Causative/other: if they can happen at the same time, then they refer to different features. In fact, I can think of a combination of both in Italian, too, though periphrastically:
Using layers, a solution like
Sorry, can I ask you how exactly active/passive and causative interact in this example? |
It seems that the comma annotation does have an interpretation which I orignally had in mind. Thank you for this addition.
I think that the very definition of the 'Cau' value of the universal Voice feature points to its different status compared to the values 'Act' and 'Pass' ("Causative forms of verbs are classified as a voice category because, when compared to the basic active form, they change the number of participants and their mapping on semantic roles."). CArm. seems to fully conform to this universal definition of the 'Cau' value.
Because Or maybe something like This may be indeed a better option, which would allow to tag, for example, anticausative derivational markers by the same feature. One might probably even think of introducing a universal feature
owsan-im has a mediopassive ending -im added to the present stem of the base verb, while owsowc'an-em has an active ending -em added to the causative stem, derived from the base verb with the help of -owc'. |
Adding @dan-zeman - any thoughts? |
Using layered features for this seems to be off, as Amir has noted. (Plus, I am puzzled by the layer name "valency" - in my opinion, the feature name Also, Flavio rightly noted that the comma notation is for something else. There is a semi-standard way to do this (meaning it is not enough promoted in the universal guidelines but it has emerged as a de-facto standard in agglutinating languages and nothing better has been invented since then, so it should probably be officially mentioned in the guidelines, too):
In Turkish, as I understand it, the sequence of values simply reflects the sequence of passive and causative morphemes suffixed to the word. But I think the combinations can be defined even if they are reflected in the morphology less straightforwardly. |
Thank you very much for the clarifications! In case of Armenian, a minor complication would be that some inflectional forms in the paradigm of each verb, including the derived causatives, are labile, e.g. the imperfect tense forms: ows-owc'-an-ei 'I tought/I was tought'. I guess these should then be tagged using a comma like |
Whether voice-specific morphology is observable, and what to do if it isn't, is a different question. I think that in most languages |
This was just the first name that came to my mind 😬 I am a little confused about the exact definition of "voice". More or less any operation with verbs has or can have to do with valency, but it seems that passive and causative act at different levels. But probably the feature is always the same.
This looks like a convenient temporary solution, but at the same it highlights the problem that many valency/changing operations at once can take place on the same verb (looking at "monsters" like It is not really the same thing, but I can see something similar happening with other markings. For example, Latin verbs have a so-called frequentative (a diminutive degree, actually) form, and sometimes it can be repeated:
Currently, I see no way to annotate this, but it could be something like
Is it not almost mandatory not to annotate negatively defined features (proprietates ad absentiam 😬 )? |
I aim at maintaining a strictly morphological principle in the FEAT field for Classical Armenian, which can be useful for tracing how syntax is mapped in morphology. For that reason I care which forms are marked by the active or passive voice, or unmarked, and whether they are causatives or not at the same time. So I would certainly maintain both
The valency-coding morphology has changed a lot in Modern Armenian. Eastern Modern Armenian uses a transitivizing suffix -c'n- and an intransitivizing one -v- while the endings as such do not express the oppositional voice. Moreover, the "ArmTDP" treebank does not follow the morphological principle in assigning the Voice values. For example, one finds both forms with -v- and without it tagged as |
If "," in values can only mean "either/or but not both at once", then maybe we need a canonical way to mean "multiple simultaneous values"? How about canonizing "+" for this:
I could imagine this might come in handy in other scenarios as well. |
That would be a UDv3 type of change for me because it would negate a very low-level assumption required by the guidelines/validator (and thus potentially built into other UD-related tools) – the "+" character cannot occur in feature values. But I guess there are other reasons why I don't like it:
|
Thank you once again for your comments. I will stick to the |
I would like to address the issue of annotating derived causatives, which seem to correspond to the
Cau
value of the Voice feature in UD. I will use Classical Armenian as an example, but I think it may be relevant for other treebanks as well, e.g. Sanskrit.Classical Armenian has an oppositional voice (
Act
andPass
, wherePass
may be conventionally used to tag the whole range of non-active meanings) and causatives, which are formed with the help of a dedicated causative suffix -owcՙ-, and can be additionally characterized for the oppositional voice, e.g. base verb pass. owsan-im 'I learn' > caus. act. ows-owc'-an-em 'I teach'. The question is how to map this pattern in UD?In my view, it is reasonable to use a layered feature
Voice[caus]
next toVoice
(the latter being reserved for the oppositional voice). Ideally one would want to keep the valueCau
for theVoice[caus]
feature to be consistent with the annotation of causatives across treebanks. However, if I understand the feedback of the validator correctly, features with a single value seem to be allowed only if the value is 'Yes' (e.g. Reflex=Yes). What would be a preferable solution here, to useVoice[caus]=Yes
(excluding it from the universal values of the Voice feature) or to introduce a dummy value of the layered feature for base verbs, which would allow forVoice[caus]=Cau
.Are there yet other options? I will be greatful for any feedback.
The text was updated successfully, but these errors were encountered: