Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need help in fixing text and brackets to its (correct parents) #122

Closed
Shasetty opened this issue Apr 10, 2024 · 5 comments
Closed

need help in fixing text and brackets to its (correct parents) #122

Shasetty opened this issue Apr 10, 2024 · 5 comments

Comments

@Shasetty
Copy link

used text : -
On November 29, 2022, Twin Ridge, Carbon Revolution Public Limited Company (formerly known as Poppetell Limited), a public limited company incorporated in Ireland with registered number 607450 (“MergeCo”), Carbon Revolution and Poppettell Merger Sub, a Cayman Islands exempted company and wholly-owned subsidiary of MergeCo (“Merger Sub”), entered into a Business Combination Agreement (as it may be amended or supplemented from time to time, the “Business Combination Agreement”), pursuant to which, among other things, Twin Ridge will be merged with and into Merger Sub, with Merger Sub surviving as a wholly-owned subsidiary of MergeCo (the “Merger”), with shareholders of Twin Ridge receiving ordinary shares of MergeCo, par value $0.0001 (the “MergeCo Ordinary Shares”), in exchange for their existing Twin Ridge Ordinary Shares (as defined below) and existing Twin Ridge warrant holders having their warrants automatically exchanged by assumption by MergeCo of the obligations under such warrants, including to become exercisable in respect of MergeCo Ordinary Shares instead of Twin Ridge Ordinary Shares, subject to, among other things, the approval of Twin Ridge’s shareholders.

used parser:-
version : UD2.10
model: english-ewt-ud-2.10-220711

url:
https://lindat.mff.cuni.cz/services/udpipe/

text and brackets below mentioned are not getting properly attached to its parents:-
issue of bracket from part of the above content :- of MergeCo (“Merger Sub”)
issue in few words from the above content :- “Business Combination Agreement”)

image

please suggest me how to fix the issue

@dan-zeman
Copy link
Collaborator

This issue probably does not belong here because it is about UDPipe rather than Udapi. Anyway, I think you should try a newer model. With english-ewt-ud-2.12-230717, I got better results on this terrible sentence.

@Shasetty
Copy link
Author

Thank you for answering.

Above given text is a, single paragraph.
model: english-ewt-ud-2.10-220711 , considers as single para
english-ewt-ud-2.12-230717 , considers as 2 para.

Even in english-ewt-ud-2.12-230717 text and brackets are not getting properly attached to its parents.

As i am using, english-ewt-ud-2.10-220711, i want a solution for the same. can you help me.

@dan-zeman
Copy link
Collaborator

A model is always just a model. It depends on the data it was trained on, and it will rarely give you 100% correct output. If you are not satisfied with the results, you can either search for a better model, or write a program that will postprocess the parser's output and fix the errors, perhaps based on some heuristics.

@Shasetty
Copy link
Author

Thank you for replying.

(ewt-ud-2.10-220711) is the best model, among the online available models.
Many grammar relations, mentioned in book (a-comprehensive-grammar-of-the-english-language) matches the 2.10 version output.

I am not skilled in Grammar like you people are.
Further i lack knowledge of python also.

fixing issues in model (ewt-ud-2.12-230717), when raised in ufal/udpipe#175, a solution was informed as : [udapy -s ud.FixPunct < in.conllu > out.conllu].

in comparison between (ewt-ud-2.10-220711) & (ewt-ud-2.12-230717) i found (ewt-ud-2.10-220711) version good.

I request you to provide a solution for (ewt-ud-2.10-220711).

@martinpopel
Copy link
Contributor

As I explained in ufal/udpipe#189 (comment), udapy -s ud.FixPunct < in.conllu > out.conllu does indeed correct the wrongly (non-projectively) attached punctuation tokens (even in this ridiculously long sentence), so I don't see any Udapi-related bug here and I am closing this issue.

Note that based on your original issue, I have fixed wrongly attached punctuation in EWT (yes, using ud.FixPunct), and this was released in UD_English-EWT 2.13 in November 2023, so we just need to wait until a UDPipe model trained on UD_English-EWT 2.13 or newer is published and then I hope there will be less non-projective punctuation problems in its outputs (although I cannot guarantee zero problems, so maybe we will still need to use ud.FixPunct).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants