Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong parent & child attachment [udapy -s ud.FixPunct < in.conllu > out.conllu] #189 #190

Closed
Shasetty opened this issue Jul 31, 2024 · 2 comments

Comments

@Shasetty
Copy link

continuing the earlier ticket #189

Hi Grammaticians,

As you know my earlier ticket #189 was closed, informing the limitation of the present software.

Now I request, you to provide a work around solution, for the text, where parent and child are wrongly connected.

Points to be considered are:-

  1. grammar rules should always be followed.
    2)break up the single sentence into multiple sentences.
    3)run each sentence individually on 2.10 version
    4)merge all the files
    5)while merging change parent and child dependency relationship (as needed)

considered text:-

On November 29, 2022, Twin Ridge, Carbon Revolution Public Limited Company (formerly known as Poppetell Limited), a public limited company incorporated in Ireland with registered number 607450 (“MergeCo”), Carbon Revolution and Poppettell Merger Sub, a Cayman Islands exempted company and wholly-owned subsidiary of MergeCo (“Merger Sub”), entered into a Business Combination Agreement (as it may be amended or supplemented from time to time, the “Business Combination Agreement”), pursuant to which, among other things, Twin Ridge will be merged with and into Merger Sub, with Merger Sub surviving as a wholly-owned subsidiary of MergeCo (the “Merger”), with shareholders of Twin Ridge receiving ordinary shares of MergeCo, par value $0.0001 (the “MergeCo Ordinary Shares”), in exchange for their existing Twin Ridge Ordinary Shares (as defined below) and existing Twin Ridge warrant holders having their warrants automatically exchanged by assumption by MergeCo of the obligations under such warrants, including to become exercisable in respect of MergeCo Ordinary Shares instead of Twin Ridge Ordinary Shares, subject to, among other things, the approval of Twin Ridge’s shareholders.

@Shasetty
Copy link
Author

Shasetty commented Aug 1, 2024

Seeing the output of 2.10 version, i have manually split the text, most of the dependency relationship are rectified.

1 issue stands out, what dependency relationship should i consider for the linking parent & child between part1 & part2, mentioned below.

linking text of part 2 with part1 :
[ , pursuant to which, among other things, Twin Ridge will be merged with and into Merger Sub,]

----------------part1------------------
On November 29, 2022, Twin Ridge, Carbon Revolution Public Limited Company (formerly known as Poppetell Limited), a public limited company incorporated in Ireland with registered number 607450 (“MergeCo”), Carbon Revolution and Poppettell Merger Sub, a Cayman Islands exempted company and wholly-owned subsidiary of MergeCo (“Merger Sub”), entered into a Business Combination Agreement (as it may be amended or supplemented from time to time, the “Business Combination Agreement”)

----------------part2------------------
, pursuant to which, among other things, Twin Ridge will be merged with and into Merger Sub, with Merger Sub surviving as a wholly-owned subsidiary of MergeCo (the “Merger”), with shareholders of Twin Ridge receiving ordinary shares of MergeCo, par value $0.0001 (the “MergeCo Ordinary Shares”), in exchange for their existing Twin Ridge Ordinary Shares (as defined below) and existing Twin Ridge warrant holders having their warrants automatically exchanged by assumption by MergeCo of the obligations under such warrants, including to become exercisable in respect of MergeCo Ordinary Shares instead of Twin Ridge Ordinary Shares, subject to, among other things, the approval of Twin Ridge’s shareholders.

@foxik
Copy link
Member

foxik commented Aug 8, 2024

Hi,

as mentioned earlier (#175 (comment), #189 (comment)), we cannot really guarantee that the result will be 100% correct, so the first point

  1. grammar rules should always be followed.

cannot really be guaranteed.

In the question, you also mention one specific issue – that if the sentence is very long, UDPipe will make more mistakes than if you process individual parts separately. That is probably true (because the training sentences are quite short, and the longer the sentence is, the more candidates for heads you have to consider for every word). To improve it:

  • one could train models on longer sentences (maybe synthetically generated from the smaller ones, by connecting them according to some simple rules),

  • one could, as proposed, split the long sentences into shorter segments, parse the shorter segments independently, and then finally connect the resulting segment trees. However:

    • one would have to choose where to split the sentences (so that there are as least edges as possible leading between segments), and
    • one would have to decide how to merge the resulting trees.

    There are of course several possibilities how to approach these tasks (rule-based or ML-based).

However, we have currently focused our efforts elsewhere (to finish LinPipe and provide "UDPipe 3" models there), so we are currently not planning to work on this specific problem (very long sentences) ourselves in the next year or so. Therefore, I am closing this issue (as it is a feature request we are currently not plan to implement).

@foxik foxik closed this as completed Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants