Permalink
Fetching contributors…
Cannot retrieve contributors at this time
136 lines (120 sloc) 5.11 KB
---
# Copyright 2017 Yahoo Holdings. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root.
title: "The EQUIV Operator"
---
<p>
EQUIV is a query operator that can be used to
add synonyms for words where the various synonyms should be
equivalent.
</p>
<h2 id="use-case">Use case</h2>
<p>
The typical use case is something like this:
<ul>
<li>The user's query is something like <code>(used AND
automobile)</code></li>
<li>We somehow - probably by using a dictionary - have
knowledge that <em>automobile</em> is a synonym for
<em>car</em></li>
<li>We rewrite the query to <code>(used AND (automobile EQUIV
car))</code></li>
<li>We do not care if a document contains <em>automobile</em>
or <em>car</em>, they are equivalent.</li>
</ul>
The last point deserves some elaboration: In this case, we want the
query to behave as if all occurrences of <em>car</em> in the
document corpus had been replaced by <em>automobile</em> and we
were running the original query <code>(used AND automobile)</code>.
</p>
<h2 id="difference-from-or-operator">Difference from OR operator</h2>
<p>
In many circumstances, the OR operator will give the same results as an EQUIV.
The matching logic is exactly the same, and an OR does not
have <a href="#limitations">the limitations that EQUIV does</a>.
The difference is in how matches are visible to ranking functions.
All words that are children of an OR count for ranking.
When using an EQUIV however, it looks like a single word:
<ul>
<li>Counts as only +1 for queryTermCount</li>
<li>Counts as 1 word for completeness measures</li>
<li>Proximity will not discriminate different words inside the
EQUIV</li>
<li>Connectivity can be set between the entire EQUIV and the word
before and after</li>
<li>Items inside the EQUIV are not directly visible to ranking
features, so weight and connectivity on those will have no
effect.</li>
</ul>
In many cases it may still be more appropriate to use OR instead of EQUIV.
When looking for an entity like Sean <em>Diddy</em> Combs this might look appropriate:
<pre>
"Diddy" EQUIV "Sean Combs" EQUIV "Sean John" EQUIV "Puff Daddy" EQUIV "P. Diddy"
</pre>
But <em>Diddy</em> is used by other people - even other pop
artists - so matching that alone is not a sure hit for the entity
we are looking for, and finding more than one of the synonyms in the
same text would give better confidence.
This is exactly what OR does, so something like:
<pre>
"Diddy"!20 OR "Sean Combs"!75 OR "Sean John"!75 OR "Puff Daddy"!80 OR "P. Diddy"!60 OR "Sean John Combs"!100
</pre>
might be better, with lower weights on the alternatives giving less confidence.
If it looks like the many words and phrases inside the OR
overwhelms other words in the query, giving even lower weights may be useful,
for example making the sum of weights 100 - the default weight for just one alternative.
</p>
<h2 id="how-to-use-equiv">How to use EQUIV</h2>
<p>
The decision to use EQUIV must be taken by application-specific dictionary or linguistics use.
This can be done from <a href="../query-language.html#equiv">YQL</a>
or from a container plugin where the query object can be manipulated as follows:
<ul>
<li>Find a word item in the query</li>
<li>Check that an EQUIV can be used in that place
(see <a href="#limitations">limitations</a> below).</li>
<li>Find the synonyms you want to insert in your dictionary</li>
<li>Make word items for your synonyms</li>
<li>Make an <code>EquivItem</code> with the synonyms (and the
original word of course) as children</li>
<li>Replace the original <code>WordItem</code> with the
new <code>EquivItem</code></li>
</ul>
For ideas on how the code might look there is
<a href="http://javadoc.io/page/com.yahoo.vespa/container-search/latest/com/yahoo/prelude/query/EquivItem.html">javadoc</a>
available, with a typical insertion of <code>EquivItem</code> looking like this:
</p>
<pre>
private Item equivize(Item item) {
if (item instanceof TermItem) {
String word = ((TermItem)item).stringValue();
// lookup word in dictionary:
DictEntry entry = dict.get(word);
// if synonyms found, make equiv and replace this word:
if (entry != null) {
EquivItem eq = new EquivItem(item, entry.synonyms);
return eq;
}
} else if (item instanceof PhraseItem ||
item instanceof PhraseSegmentItem) {
// cannot put EQUIV inside PHRASE
return item;
} else if (item instanceof CompositeItem) {
CompositeItem cmp = (CompositeItem)item;
for (int i = 0; i &lt; cmp.getItemCount(); ++i) {
cmp.setItem(i, equivize(cmp.getItem(i)));
}
return cmp;
}
return item;
}
</pre>
<h2 id="limitations">Limitations</h2>
<p>
There are several limitations on how EQUIV can be used in a query:
<ul>
<li>EQUIV may not appear inside a phrase.</li>
<li>It may only contain <code>TermItem</code> and <code>PhraseItem</code> instances.
You cannot place operators like AND inside EQUIV.</li>
<li><code>PhraseItems</code> inside EQUIV will rank like as if they have size 1.</li>
</ul>
</p>