meet operator behaving strangely #164

Ansa211 · 2018-01-12T11:24:28Z

This issue is very confusing for me, any explanation would be welcome.

(meet 1:[mwe_id="(.*;.*)"] 2:[] -5 -1) & 1.mwe_id=2.mwe_id within <s/> has 3 results, and as expected, all three are to the left of the main word because meet has parameters -5 -1

(meet 1:[mwe_id="(.*;.*)"] 2:[] 1 5) & 1.mwe_id=2.mwe_id within <s/> has 4 results and as expected, all 4 are to the right of the KWIC word because meet has parameters 1 5

Question 1:
the only condition on nodes 1 and 2 is one of equality; in other words, the second query should match the same sentences as the first, but the two words should swap roles (in the second query, the left one of them should be KWIC and the right one should be in context). Why is it not so?

(meet 1:[mwe_id="(.*;.*)"] 2:[] -5 5) & 1.mwe_id=2.mwe_id within <s/>
should match both to the left and to the right of the KWIC word because of the parameters -5 5; however, it gives the same result as setting the parameters to -5 -1, why?

The text was updated successfully, but these errors were encountered:

Ansa211 · 2018-01-12T11:41:36Z

Same issue, simplified queries, not so easy to overview output:

(meet 1:[mwe_id!="_"] 2:[mwe_id!="_"] 1 5) & 1.mwe_id=2.mwe_id & 1.word!=2.word within <s/>
---> apply negative filter (meet 1:[mwe_id!="_"] 2:[mwe_id!="_"] -5 -1) & 1.mwe_id=2.mwe_id & 1.word!=2.word within <s/>

Again, I expected that the second query matches exactly the same nodes as the first one but with roles swapped, so after the application of the negative filter, there should be nothing left; so why are there 22 lines in the output?

Ansa211 · 2018-04-03T10:55:58Z

I tested this on an instance of NoSketchEngine; the results are the same, so the problem must be in Manatee. I will report it to SketchEngine people.

Ansa211 · 2018-04-11T14:49:42Z

Answer from SketchEngine:

Dear Anna,

unfortunately I don't have any good news for you -- after much deliberation, the gurus told me that when label positions are ambiguous, the result is unspecified. Currently, only one of the possibilities is propagated through the evaluation tree. Only the position of the KWIC is what differentiates between different result rows.

Therefore, queries like this are not well-formed and should be avoided. The query can possibly be formulated in a different way or perhaps emulated using the filtering functionality on concordances.

Best Regards,
Ondrej Herman

Sketch Engine Team

Previous communication

URL: https://the.sketchengine.co.uk/corpus/first?corpname=preloaded%2Fsusanne&reload=&iquery=&queryselector=cqlrow&lemma=&lpos=&phrase=&word=&wpos=&char=&cql=%28meet+1%3A%22his%22+2%3A%5B%5D+1+5%29+%261.word%3D2.word&default_attr=word&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all&fc_pos_window_type=both&fc_pos_wsize=5&fc_pos_type=all

I do not understand why this query has empty output, while http://ske.li/e6x has 18 results. My expectation was that this query matches exactly the same sentences, but with the first of the two words being the KWIC (instead of the second which is the KWIC in http://ske.li/e6).

I have described another example of a similar problem with the meet operator and global conditions at #164 . The same queries as mentioned there were tested in a NoSke instance on http://corpora.phil.hhu.de/bonito/parseme.cgi/first?corpname=parseme_de_a&reload=1&iquery=&queryselector=cqlrow&lemma=&phrase=&word=&char=&cql=&default_attr=word&fc_lemword_window_type=both&fc_lemword_wsize=5&fc_lemword=&fc_lemword_type=all, so I believe the unexpected behaviour is due to Manatee and not due to the front-end.

Ansa211 · 2018-04-11T15:07:46Z

At least the following two queries, in which the conditions on node 2 have been more fully specified, have the same number of results (30):
(meet 1:[mwe_id="(.*;.*)"] 2:[mwe_id="(.*;.*)"] -5 -1) & 1.mwe_id=2.mwe_id within <s/>
(meet 1:[mwe_id="(.*;.*)"] 2:[mwe_id="(.*;.*)"] 1 5) & 1.mwe_id=2.mwe_id within <s/>

But this version has 122 - and in some of them, only one node is highlighted:
(meet 1:[mwe_id="(.*;.*)"] 2:[mwe_id="(.*;.*)"] -5 5) & 1.mwe_id=2.mwe_id within <s/>

Ansa211 · 2018-04-23T16:36:44Z

Further correspondence with Ondrej Herman has clarified the issue even further.

From my message:

Could you please be more specific about what you mean by ambiguous label positions? Is this the case that any query of the form
(meet 1:[conditions1] 2:[conditions2] -num1 num2) & 1.attribute1 = 2.attribute2
is malformed? Or even any
(meet 1:[conditions1] 2:[conditions2] -num1 num2)
? (From a tiny bit of experimentation, I suspect the latter.)
Also, does the same issue concern any other query types that you can think of?

I tried to emulate such queries (with the condition on some parameters being equal between the two words, so that I really need the labels) through the use of filters, but I found no way how to do it - the labels of the positions (such as 1 and 2) are not remembered from the original query to the application of the filter.

Of course, one could go back to
(1:[conditions1] []{0,num2} 2:[conditions2] | 2:[conditions2] []{0,num1} 1:[conditions1]) & 1.attribute1 = 2.attribute2
which should work (is that correct?), but that means loosing the functionality of meet (the fact that only the two relevant words are highlighted and only one of them is the KWIC).

I would be grateful if you have any further ideas for reformulating/emulating this type of query.
But more importantly, I would like to understand better which queries I should avoid.

From Ondrej Herman's reply:

Operace (meet A B x y) se snaží vyhledat všechna A, která mají v okně daném parametry x a y nějaký výskyt B. Globální podmínka pak filtruje řádky tohoto výsledku, které neodpovídají žádané podmínce. To znamená, že ve výsledku nikdy nebude víc výskytů A na stejné pozici. Meet obecně ani není komutativní.

Váš první příklad může dávat platné výsledky, ale pouze pokud v korpusu ke každému A existuje právě jedno B. Druhý dotaz je v pořádku, ale výsledek jsou opět všechna A a label pro B je pro jednotlivé výskyty spíše informativního charakteru.

Dotaz
(1:[conditions1] []{0,num2} 2:[conditions2] | 2:[conditions2] []{0,num1} 1:[conditions1]) & 1.attribute1 = 2.attribute2
má obdobný problém. Částečné výsledky levé a pravé části kolem svislítka mohou být identické a lišit se jen v labelech. K vyhodnocení globalní podmínky se pak přes operátor svislítka dostane jen jeden z nich.

Dotaz s "meet" má ještě jeden rozdíl oproti tomuto dotazu -- A a B s meet se mohou nacházet na stejné pozici.

Obávám se, že tato omezení v CQL nedokážeme moc dobře, blížíte se k limitům jazyka. Ani jiné řešení, které by šlo naklikat, mě nenapadá, ale zkusím se ještě poptat.

Osobně bych postupoval tak, že bych upravil skript corpquery distribuovaný s Manatee -- krmil bych jej Vaším dotazem bez globálních podmínek, které bych vyhodnocoval mimo CQL

Ansa211 added the help wanted label Jan 14, 2018

Ansa211 closed this as completed Apr 3, 2018

Ansa211 added wontfix and removed help wanted labels Apr 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meet operator behaving strangely #164

meet operator behaving strangely #164

Ansa211 commented Jan 12, 2018

Ansa211 commented Jan 12, 2018 •

edited

Loading

Ansa211 commented Apr 3, 2018

Ansa211 commented Apr 11, 2018

Ansa211 commented Apr 11, 2018

Ansa211 commented Apr 23, 2018

meet operator behaving strangely #164

meet operator behaving strangely #164

Comments

Ansa211 commented Jan 12, 2018

Ansa211 commented Jan 12, 2018 • edited Loading

Ansa211 commented Apr 3, 2018

Ansa211 commented Apr 11, 2018

Ansa211 commented Apr 11, 2018

Ansa211 commented Apr 23, 2018

Ansa211 commented Jan 12, 2018 •

edited

Loading