# [ Chapter 6 - Using Content to Learn Domain-specific Language ]
# Query Classification and Disambiguation with Semantic Knowledge Graphs

NOTE: This notebook depends upon the the Stack Exchange datasets. If you have any issues, please rerun the [Setting up the Stack Exchange Dataset](../ch05/2.index-datasets.ipynb) notebook.

In [1]:
import sys
sys.path.append('..')
sys.path.append("webserver")
from aips import get_engine, get_semantic_knowledge_graph
import json
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("AIPS").getOrCreate()

skg = get_semantic_knowledge_graph()
engine = get_engine()

## Query Classification

In [2]:
stackexchange_collection = engine.get_collection("stackexchange")

## Listing 6.1

In [3]:
def print_query_classification(query, classification_field="category", 
      classification_limit=5, keywords_field="body", min_occurrences=5):
    nodes_to_traverse = [{"field": keywords_field, "values": [query]},
                         {"field": classification_field,
                          "min_occurrences": min_occurrences,
                          "limit": classification_limit}]
    
    traversals = skg.traverse(stackexchange_collection, *nodes_to_traverse)    
    classifications = traversals["graph"][0]["values"][query]["traversals"][0]["values"]
    
    print(f"Query: {query}") 
    print("  Classifications:")
    for term, data in classifications.items():
        print(f'    {term}  {data["relatedness"]}')
    print()

In [5]:
print_query_classification("docker", classification_limit=3)
print_query_classification("airplane", classification_limit=1)
print_query_classification("airplane AND crash", classification_limit=2)
print_query_classification("vitamins", classification_limit=2)
print_query_classification("alien", classification_limit=1)
print_query_classification("passport", classification_limit=1)
print_query_classification("driver", classification_limit=2)
print_query_classification("driver AND taxi", classification_limit=2)
print_query_classification("driver AND install", classification_limit=2)

Query: docker
  Classifications:
    devops  0.87978

Query: airplane
  Classifications:
    travel  0.33334

Query: airplane AND crash
  Classifications:
    scifi  0.02149
    travel  0.00475

Query: vitamins
  Classifications:
    health  0.48681
    cooking  0.09441

Query: alien
  Classifications:
    scifi  0.62541

Query: passport
  Classifications:
    travel  0.82883

Query: driver
  Classifications:
    travel  0.38996
    devops  0.08917

Query: driver AND taxi
  Classifications:
    travel  0.24184
    scifi  -0.13757

Query: driver AND install
  Classifications:
    devops  0.22277
    travel  -0.00675



## Disambiguation

## Listing 6.2

In [6]:
def print_disambiguation_query(query, context_field="category", context_limit=5,
      keywords_field="body", keywords_limit=10, min_occurrences=5, print_request=False):
    
    nodes_to_traverse = [{"field": keywords_field, "values": [query]},
                         {"field": context_field,
                          "min_occurrences": min_occurrences, 
                          "limit": context_limit},
                         {"field": keywords_field,
                          "min_occurrences": min_occurrences, 
                          "limit": keywords_limit}]
    
    traversals = skg.traverse(stackexchange_collection, *nodes_to_traverse)    
    classifications = traversals["graph"][0]["values"][query]["traversals"][0]["values"]
    
    print(f"Query: {query}") 
    for context, data in classifications.items():
        print(f'  Context: {context}  {data["relatedness"]}')
        print("    Keywords: ")
        for keyword, keyword_data in data["traversals"][0]["values"].items():
            print(f'      {keyword}  {keyword_data["relatedness"]}')
        print()
    if (print_request):
        print(json.dumps(skg.generate_skg_request(*nodes_to_traverse), indent="  "))

## Listing 6.3

In [7]:
print_disambiguation_query("server")
print_disambiguation_query("driver", context_limit=2)
print_disambiguation_query("chef", context_limit=2, print_request=True)

Query: server
  Context: devops  0.83796
    Keywords: 
      server  0.93698
      servers  0.76818
      docker  0.75955
      code  0.72832
      configuration  0.70686
      deploy  0.70634
      nginx  0.70366
      jenkins  0.69934
      git  0.68932
      ssh  0.6836

  Context: cooking  -0.1574
    Keywords: 
      server  0.66363
      restaurant  0.16482
      pie  0.12882
      served  0.12098
      restaurants  0.11679
      knife  0.10788
      pieces  0.10135
      serve  0.08934
      staff  0.0886
      dish  0.08553

  Context: travel  -0.15959
    Keywords: 
      server  0.81226
      tipping  0.54391
      vpn  0.45352
      tip  0.41117
      servers  0.39053
      firewall  0.33092
      restaurant  0.21698
      tips  0.19524
      bill  0.18951
      cash  0.18485

  Context: scifi  -0.28208
    Keywords: 
      server  0.78173
      flynn's  0.53341
      computer  0.28075
      computers  0.2593
      flynn  0.24963
      servers  0.24778
      grid  0.23889
 

## Success!

You've leveraged a semantic knowledge graph to find related terms for a query, performed query expansion based upon semantically-similar terms, explored multiple different way to impact precision and recall of queries through integrating semantically-augmented queries, generated content-based recommendations leveraging a semantic knowledge graph, explored arbitrary relationship types by traversing a semantic knowledge graph, and performed both query classification and query disambiguration using a semantic knowledge graph.

Semantic knowledge graphs can be a powerful tool for understaning user intent and interpreting both queries and content based upon meaning instead of just text kewords.

Up next: [Related Keyword Detection from Signals](../ch06/2.related-keywords-from-signals.ipynb)