-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meet operator behaving strangely #164
Comments
Same issue, simplified queries, not so easy to overview output:
Again, I expected that the second query matches exactly the same nodes as the first one but with roles swapped, so after the application of the negative filter, there should be nothing left; so why are there 22 lines in the output? |
I tested this on an instance of NoSketchEngine; the results are the same, so the problem must be in Manatee. I will report it to SketchEngine people. |
Answer from SketchEngine: Dear Anna, unfortunately I don't have any good news for you -- after much deliberation, the gurus told me that when label positions are ambiguous, the result is unspecified. Currently, only one of the possibilities is propagated through the evaluation tree. Only the position of the KWIC is what differentiates between different result rows. Therefore, queries like this are not well-formed and should be avoided. The query can possibly be formulated in a different way or perhaps emulated using the filtering functionality on concordances. Best Regards, Sketch Engine Team Previous communication I do not understand why this query has empty output, while http://ske.li/e6x has 18 results. My expectation was that this query matches exactly the same sentences, but with the first of the two words being the KWIC (instead of the second which is the KWIC in http://ske.li/e6). I have described another example of a similar problem with the meet operator and global conditions at #164 . The same queries as mentioned there were tested in a NoSke instance on http://corpora.phil.hhu.de/bonito/parseme.cgi/first?corpname=parseme_de_a&reload=1&iquery=&queryselector=cqlrow&lemma=&phrase=&word=&char=&cql=&default_attr=word&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all, so I believe the unexpected behaviour is due to Manatee and not due to the front-end. |
At least the following two queries, in which the conditions on node 2 have been more fully specified, have the same number of results (30): But this version has 122 - and in some of them, only one node is highlighted: |
Further correspondence with Ondrej Herman has clarified the issue even further. From my message: Could you please be more specific about what you mean by ambiguous label positions? Is this the case that any query of the form I tried to emulate such queries (with the condition on some parameters being equal between the two words, so that I really need the labels) through the use of filters, but I found no way how to do it - the labels of the positions (such as 1 and 2) are not remembered from the original query to the application of the filter. Of course, one could go back to I would be grateful if you have any further ideas for reformulating/emulating this type of query. From Ondrej Herman's reply: Operace Váš první příklad může dávat platné výsledky, ale pouze pokud v korpusu ke každému A existuje právě jedno B. Druhý dotaz je v pořádku, ale výsledek jsou opět všechna A a label pro B je pro jednotlivé výskyty spíše informativního charakteru. Dotaz Dotaz s "meet" má ještě jeden rozdíl oproti tomuto dotazu -- A a B s meet se mohou nacházet na stejné pozici. Obávám se, že tato omezení v CQL nedokážeme moc dobře, blížíte se k limitům jazyka. Ani jiné řešení, které by šlo naklikat, mě nenapadá, ale zkusím se ještě poptat. Osobně bych postupoval tak, že bych upravil skript corpquery distribuovaný s Manatee -- krmil bych jej Vaším dotazem bez globálních podmínek, které bych vyhodnocoval mimo CQL |
This issue is very confusing for me, any explanation would be welcome.
(meet 1:[mwe_id="(.*;.*)"] 2:[] -5 -1) & 1.mwe_id=2.mwe_id within <s/>
has 3 results, and as expected, all three are to the left of the main word because meet has parameters-5 -1
(meet 1:[mwe_id="(.*;.*)"] 2:[] 1 5) & 1.mwe_id=2.mwe_id within <s/>
has 4 results and as expected, all 4 are to the right of the KWIC word because meet has parameters1 5
Question 1:
the only condition on nodes 1 and 2 is one of equality; in other words, the second query should match the same sentences as the first, but the two words should swap roles (in the second query, the left one of them should be KWIC and the right one should be in context). Why is it not so?
(meet 1:[mwe_id="(.*;.*)"] 2:[] -5 5) & 1.mwe_id=2.mwe_id within <s/>
should match both to the left and to the right of the KWIC word because of the parameters
-5 5
; however, it gives the same result as setting the parameters to-5 -1
, why?The text was updated successfully, but these errors were encountered: