Companion data necessary for training? #6

danielhers · 2021-02-03T14:33:30Z

To train PERIN on a new dataset (not from MRP 2020), a companion file currently needs to be specified for the new text. Is this a real requirement, or is it just a result of the implementation? Does PERIN actually use any of the information from the companion data? If so, what is the easiest way to generate that data for new text?

foxik · 2021-02-03T22:00:12Z

David knows better, but my feeling is that we use the lemmas from the companion data. I.e., when constructing the rules for labels, we allow copying/modifying a corresponding lemma instead of a token (or other sources).

So either you need to lemmatize the data (you could use the UDPipe service, for example, we have the new Bert version trained on UD 2.6 running on https://lindat.mff.cuni.cz/services/udpipe/), or you could disable the usage of the lemma rules (and perform the lemmatization during the syntactic parsing).

davda54 · 2021-02-03T22:54:13Z

Yeah, the only problem is in the lemmatized tokens, which are used to create more efficient set of relative label rules -- so specifically, the absence of the companion data shouldn't impact UCCA parsing (but it will most likely negatively influence the accuracy of label prediction for the other frameworks).

I've quickly hacked a workaround to preprocess the data without a companion file into the branch no_lemmas.

davda54 · 2021-02-03T23:00:39Z

As for generating the companion data (i.e. lemmas), you can use the code from UDPipeWrapper.

danielhers · 2021-02-04T08:38:14Z

This makes a lot of sense. Thank you both for the quick solution!
Besides, it's good to know the new UDPipe is already usable, even if not yet offline.
I'll be happy to close the issue unless you want to keep it, e.g. for adding documentation about this option.

davda54 · 2021-03-21T22:14:19Z

Merged into the main branch [#9], closing.

davda54 closed this as completed Mar 21, 2021

davda54 mentioned this issue Jun 28, 2021

training the UCCA model with other languages #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Companion data necessary for training? #6

Companion data necessary for training? #6

danielhers commented Feb 3, 2021

foxik commented Feb 3, 2021

davda54 commented Feb 3, 2021 •

edited

Loading

davda54 commented Feb 3, 2021

danielhers commented Feb 4, 2021

davda54 commented Mar 21, 2021

Companion data necessary for training? #6

Companion data necessary for training? #6

Comments

danielhers commented Feb 3, 2021

foxik commented Feb 3, 2021

davda54 commented Feb 3, 2021 • edited Loading

davda54 commented Feb 3, 2021

danielhers commented Feb 4, 2021

davda54 commented Mar 21, 2021

davda54 commented Feb 3, 2021 •

edited

Loading