## Full text ranking - ms marco

Query terms with AND operator

In [None]:
{
	"yql":"select * from sources * where (userInput(@userQuery));"
	"userQuery":"what types of plate boundaries cause deep sea trenches"
	"ranking":{
		"profile":"bm25"
		"listFeatures":"true"
	}
	...
}

Query terms with OR operator

In [None]:
{
	"yql":"select * from sources * where ([{"grammar": "any"}]userInput(@userQuery));"
	...
}

ANN with query vector

In [None]:
{
	"yql":"select * from sources * where ([{"targetHits": 1000, "label": "nns"}]nearestNeighbor(title_bert, tensor_bert));"
	"userQuery":"what types of plate boundaries cause deep sea trenches"
	"ranking":{
		"profile":"bert_title_body_all"
		"listFeatures":"true"
	}
	"ranking.features.query(tensor_bert)":"[0.05121087115032622, -0.0035218095295999675, ..., 0.05303904445092506]"
	...
} 

## Cord19 app

Query terms + filters

In [None]:
body = {
  'yql'    : 'select title, abstract from sources * where userQuery() and has_full_text=true and timestamp > 1577836800;',
  'hits'   : 5,
  'query'  : 'coronavirus temperature sensitivity',
  'type'   : 'any',
  'ranking': 'bm25'
}

ANN with query vector

In [None]:
body = {
    'yql': 'select * from sources * where  ([{"targetNumHits":100}]nearestNeighbor(title_embedding, vector));',
    'hits': 5,
    'ranking.features.query(vector)': embedding.tolist(),
    'ranking.profile': 'semantic-search-title',
}

## Amazon product app

query terms

In [None]:
query = {
    'yql': 'select documentid, asin,title,imUrl,price from sources * where userQuery();',
    'query': 'mens wrist watch',
    'ranking': 'bm25',
    'type': 'any',
    'presentation.timing': True,
    'hits': 2
}
display_hits(app.query(body=query).json, "bm25")

Retrieve a query vector based on an id and then do NN (Brute force) with retrieved query vector

In [None]:
query = {
    'yql': 'select documentid, asin,title,imUrl,description,price from sources * where \
    ([{"targetHits":3,"approximate":false}]nearestNeighbor(image_vector,query_image_vector));',
    'ranking': 'vector_similarity',
    'hits': 3, 
    'presentation.timing': True,
    'ranking.features.query(query_image_vector)': get_vector('B00GLP1GTW')
}
display_hits(app.query(body=query).json, "vector_similarity")

Retrieve a query vector based on an id and then do ANN with retrieved query vector

In [None]:
query = {
    'yql': 'select documentid, asin,title,imUrl,description,price from sources * where \
    ([{"targetHits":3}]nearestNeighbor(image_vector,query_image_vector));',
    'ranking': 'vector_similarity',
    'hits': 3, 
    'presentation.timing': True,
    'ranking.features.query(query_image_vector)': get_vector('B00GLP1GTW')
}
display_hits(app.query(body=query).json, "vector_similarity")

ANN + filter to remove product being searched

In [None]:
query = {
    'yql': 'select documentid, asin,title,imUrl,description,price from sources * where \
    ([{"targetHits":3}]nearestNeighbor(image_vector,query_image_vector)) and \
    !(asin contains "B00GLP1GTW");',
    'ranking': 'vector_similarity',
    'hits': 3, 
    'presentation.timing': True,
    'ranking.features.query(query_image_vector)': get_vector('B00GLP1GTW')
}
display_hits(app.query(body=query).json, "vector_similarity")

ANN + multiple filters

In [None]:
query = {
    'yql': 'select documentid, asin,title,imUrl,description,price from sources * where \
    ([{"targetHits":3}]nearestNeighbor(image_vector,query_image_vector)) and \
    !(asin contains "B00GLP1GTW") and \
    price > 100;',
    'ranking': 'vector_similarity',
    'hits': 3, 
    'presentation.timing': True,
    'ranking.features.query(query_image_vector)': get_vector('B00GLP1GTW')
}
display_hits(app.query(body=query).json, "vector_similarity")

## News search

Search over indexed fields using keywords

In [None]:
res = app.query(body={"yql" : "select * from sources * where default contains 'music';"})

In [None]:
res = app.query(body = {"yql" : "select title, abstract from sources * where title contains 'music' AND default contains 'festival';"})

Search by document type

In [None]:
res = app.query(body = {"yql" : "select title from sources * where sddocname contains 'news';"})

Search over attribute fields such as date

In [None]:
# linear scan since fast-search not enabled
res = app.query(body={"yql" : "select title, date from sources * where date contains '20191110';"})

In [None]:
# optimized to filter first over default fields that are indexed
res = app.query(body={"yql" : "select title, abstract, date from sources * where default contains 'weather' AND date contains '20191110';"})

In [None]:
# range search
res = app.query({"yql" : "select date from sources * where date <= 20191110 AND date >= 20191108;"})

Sorting

In [None]:
# ascending by default
res = app.query(body={"yql" : "select title, date from sources * where default contains 'music' order by date;"})

In [None]:
# descending
res = app.query(body={"yql" : "select title, date from sources * where default contains 'music' order by date desc;"})

Grouping

In [None]:
res = app.query(body={"yql" : "select * from sources * where sddocname contains 'news' limit 0 | all(group(category) max(3) order(-count())each(output(count())));"})

## News recommendation

Retrieve a query vector based on an id and then do ANN with retrieved query vector

In [None]:
yql = "select title, category from sources news where ([{'targetHits': 10}]nearestNeighbor(embedding, user_embedding));"
result = app.query(
    body={
        "yql": yql,        
        "hits": 10,
        "ranking.features.query(user_embedding)": str(query_user_embedding(user_id="U63195")),
        "ranking.profile": "recommendation"
    }
)

Retrieve a query vector based on an id and then do ANN with retrieved query vector + filters

In [None]:
yql = "select title, category from sources news where " \
      "([{'targetHits': 10}]nearestNeighbor(embedding, user_embedding)) AND " \
      "category contains 'sports';" 
result = app.query(
    body={
        "yql": yql,        
        "hits": 10,
        "ranking.features.query(user_embedding)": str(query_user_embedding(user_id="U63195")),
        "ranking.profile": "recommendation"
    }
)

## QA application

Sentence level retrieval

In [None]:
result = app.query(body={
  'yql': 'select * from sources sentence where ([{"targetNumHits":100}]nearestNeighbor(sentence_embedding,query_embedding));',
  'hits': 100,
  'ranking.features.query(query_embedding)': questions.loc[0, "embedding"],
  'ranking.profile': 'semantic-similarity' 
})

Sentence level hybrid retrieval

In [None]:
result = app.query(body={
  'yql': 'select * from sources sentence  where ([{"targetNumHits":100}]nearestNeighbor(sentence_embedding,query_embedding)) or userQuery();',
  'query': questions.loc[0, "question"],
  'type': 'any',
  'hits': 100,
  'ranking.features.query(query_embedding)': questions.loc[0, "embedding"],
  'ranking.profile': 'bm25-semantic-similarity' 
})

Paragraph level retrieval

In [None]:
result = app.query(body={
  'yql': ('select * from sources sentence where ([{"targetNumHits":10000}]nearestNeighbor(sentence_embedding,query_embedding)) |' 
          'all(group(context_id) max(3) order(-max(relevance())) each( max(2) each(output(summary())) as(sentences)) as(paragraphs));'),
  'hits': 0,
  'ranking.features.query(query_embedding)': questions.loc[0, "embedding"],
  'ranking.profile': 'sentence-semantic-similarity' 
})