In [1]:
import os
from graph_handler import GraphHandler as gq
AMPhion = gq()



# Overview
For the demonstration of AMPhionQA we have provided a number of different questions that are queried against Wikidata. This allows us to highlight capabilities of the system that would be otherwise hidden from the user such as the handling of aliases. 

We have examples of both natural language questions with corresponding results as well as natural language to the corresponding SPARQL queries. For the SPARQL examples we provide the Wikidata ID and corresponding NL label for easy understanding.

# Basic Query Examples 

These examples are all similar to the 13 predicate groups we used for baseline testing.


In [2]:
results = AMPhion.run_query("What are the symptoms of influenza?")
print(results)

['headache', 'cough', 'fatigue', 'fever', 'rhinitis', 'myalgia', 'chest pain', 'chills', 'nasal congestion']


In [3]:
results = AMPhion.run_query("What are the symptoms of avian influenza?")
print(results)

['pneumonia', 'diarrhea', 'vomiting', 'nausea']


In [4]:
results = AMPhion.run_query("lactose intolerance risk factors?")
print(results)

['ethnicity', 'age of a person']


In [5]:
results = AMPhion.run_query("What are the targets of erlotinib?")
print(results)

['epidermal growth factor receptor', 'Solute carrier organic anion transporter family member 2B1']


In [6]:
results = AMPhion.run_query("What kind of disease is influenza?")
print(results)

['infectious disease', 'disease', 'viral infectious disease', 'respiratory disease', 'Orthomyxoviridae infectious disease', 'Virus diseases of plants', 'acute viral respiratory tract infection', 'symptom or sign', 'class of disease']


In [7]:
results = AMPhion.run_query("What are the different types of multiple sclerosis?")
print(results)

['neuromyelitis optica', 'clinically isolated syndrome', 'multiple sclerosis', 'secondary progressive multiple sclerosis', 'relapsing-remitting multiple sclerosis', 'AntiMOG associated encephalomyelitis', 'chronic progressive multiple sclerosis', 'Balo concentric sclerosis', 'Idiopathic inflammatory demyelinating diseases', 'tumefactive multiple sclerosis', 'primary progressive multiple sclerosis', 'MS3', 'progressive relapsing multiple sclerosis', 'MS4', 'multiple sclerosis, susceptibility to', 'Marburg multiple sclerosis', 'disseminated sclerosis with narcolepsy', 'MS2', 'multiple sclerosis, susceptibility to 1', 'multiple sclerosis, susceptibility to, 5', 'pediatric multiple sclerosis']


# Alternate Labels Examples
These examples highlight AMPhionQA's ability to handle different labels that refer to the same entity.

The wikidata ID for avian influenza is Q43987 and the predicate ID for disease transmission is P1060.

In [8]:
sparql = AMPhion.get_sparql("how is avian influenza transmitted?")
print(sparql)

SELECT DISTINCT ?item ?itemLabel 
WHERE {
wd:Q43987 wdt:P1060 ?item .
?item rdfs:label ?itemLabel . 
FILTER( LANG (?itemLabel) = 'en')
}


In [9]:
sparql = AMPhion.get_sparql("how is bird flu transmitted?")
print(sparql)

SELECT DISTINCT ?item ?itemLabel 
WHERE {
wd:Q43987 wdt:P1060 ?item .
?item rdfs:label ?itemLabel . 
FILTER( LANG (?itemLabel) = 'en')
}


# Compound Query Examples
These examples highlight AMPhionQA's ability to construct compound queries using less complex examples as components. The first two examples are similar to those found in the experimentation data while the latter two examples of more complex queries we have yet to incorporate into our AMP example set.

The Wikidata IDs used are: \
type-1 diabetes - Q3025883 \
type-2 diabetes - Q124407 \
genetic association - P2293

As we can see from the examples, AMPhionQA was able to logically construct correct SPARQL to find the set of genes related to both type-1 and type-2 diabetes.

In [10]:
results = AMPhion.run_query("List the genes related to type-1 diabetes.")
print(results, "\n")

sparql = AMPhion.get_sparql("List the genes related to type-1 diabetes.")
print(sparql)

['CTSH', 'CTLA4', 'PTPN22', 'IL2', 'TYK2', 'LMO7', 'PGM1', 'PTPN2', 'PAX4', 'AFF3', 'PRKCQ', 'ERBB4', 'IL2RA', 'IL7R', 'CD69', 'ERBB3', 'GLIS3', 'FCRL3', 'CAPSL', 'ADAD1', 'IKZF4', 'BACH2', 'IFIH1', 'SH2B3', 'RASGRP1', 'ANGPTL8', 'UBASH3A', 'CLEC16A', 'CD226', 'CUX2'] 

SELECT DISTINCT ?item ?itemLabel 
WHERE {
wd:Q124407 wdt:P2293|^wdt:P2293 ?item .
?item rdfs:label ?itemLabel . 
FILTER( LANG (?itemLabel) = 'en')
}


In [11]:
results = AMPhion.run_query("List the genes related to type-2 diabetes.")
print(results, "\n")

sparql = AMPhion.get_sparql("List the genes related to type-2 diabetes.")
print(sparql)

['GCK', 'CRHR2', 'ACHE', 'LAMA1', 'KCNQ1', 'FTO', 'TGFBR3', 'SLC30A8', 'SYK', 'CR2', 'KCNJ11', 'TCF7L2', 'SRR', 'LIMK2', 'HNF4A', 'GRK5', 'ADCY5', 'DGKB', 'IGF2BP2', 'GPSM1', 'PEX5L', 'SASH1', 'HMG20A', 'FAF1', 'ARL15', 'PLS1', 'PEPD', 'MTNR1B', 'PPARD', 'CDKAL1', 'ZMIZ1', 'RHOU', 'CMIP', 'THADA', 'ZFAND3', 'MARCHF1', 'RASGRP1', 'WFS1', 'ELMO1', 'MPHOSPH9', 'UBE2E2', 'VPS26A', 'MAEA', 'PTPRD', 'CCDC102A', 'CCNQ', 'PCBD2', 'DNER', 'LINGO2', 'GLIS3', 'HNF1B', 'ST6GAL1', 'RNF6', 'HNF1A', 'JAZF1', 'TCERG1L', 'SLC16A13', 'PPARG'] 

SELECT DISTINCT ?item ?itemLabel 
WHERE {
wd:Q3025883 wdt:P2293|^wdt:P2293 ?item .
?item rdfs:label ?itemLabel . 
FILTER( LANG (?itemLabel) = 'en')
}


In [12]:
results = AMPhion.run_query("List the genes related to both type-1 and type-2 diabetes.")
print(results, "\n")

sparql = AMPhion.get_sparql("List the genes related to both type-1 and type-2 diabetes.")
print(sparql)

['RASGRP1', 'GLIS3'] 

SELECT DISTINCT ?item ?itemLabel 
WHERE {
wd:Q124407 wdt:P2293|^wdt:P2293 ?item .
?item wdt:P2293|^wdt:P2293 wd:Q3025883 .
?item rdfs:label ?itemLabel . 
FILTER( LANG (?itemLabel) = 'en')
}


In [13]:
results = AMPhion.run_query("List the genes related to both types of diabetes.")
print(results, "\n")

sparql = AMPhion.get_sparql("List the genes related to both types of diabetes.")
print(sparql)

['RASGRP1', 'GLIS3'] 

SELECT DISTINCT ?item ?itemLabel 
WHERE {
wd:Q124407 wdt:P2293|^wdt:P2293 ?item .
?item wdt:P2293|^wdt:P2293 wd:Q3025883 .
?item rdfs:label ?itemLabel . 
FILTER( LANG (?itemLabel) = 'en')
}


# Examples outside of training set

These are some of the questions we are able to correctly answer even though AMPhionQA has not been given any similar examples that could have helped. 

In [14]:
results = AMPhion.run_query("What is BRAF V600E the positive therapeutic predictors for?")
print(results)

['vemurafenib', 'trametinib', 'Irinotecan / Vemurafenib / Cetuximab combination therapy', 'Panitumumab / Dabrafenib / Trametinib combination therapy', 'Pertuzumab / vemurafenib combination therapy', 'trametinib / vemurafenib / dabrafenib combination therapy', 'cobimetinib fumarate', 'pictilisib', 'Dabrafenib / Trametinib combination therapy', 'Panitumumab / Trametinib combination therapy', 'Sorafenib / Panitumumab combination therapy', 'dabrafenib', 'Capecitabine / Vemurafenib / Bevacizumab combination therapy', 'vemurafenib / cobimetinib fumarate combination therapy', 'Cetuximab / encorafenib combination therapy', 'cetuximab / encorafenib / binimetinib combination therapy', 'irinotecan / Panitumumab / vemurafenib combination therapy', 'erlotinib / vemurafenib combination therapy', 'Vemurafenib / Panitumumab combination therapy', 'Vemurafenib / Gefitinib / Cetuximab combination therapy']


In [15]:
results = AMPhion.run_query("What is BRAF V600E a variant of?")
print(results)

['BRAF']


The following two examples show the potential for AMPhionQA's growth as we add more examples to cover a wider range of topics. The first question correctly retrieve the significant interactions for paracetamol (acetaminaphin), the active ingredient in Tylenol. The second question attempts to do the same by getting the active ingredient of Tylenol and then finding the significant interactions but fails in the end as it places an irrelevant entity where a variable is needed. This is still promising for our system since neither example has been incorporated into our training yet it still comes so close to being fully able to handle them.

Wikidata IDs: \
drug interaction - P769 \
paracetamol (acetaminaphin) - Q57055 \
Tylenol - Q47521665


In [16]:
results = AMPhion.run_query("What are the significant drug interactions of paracetamol?")
print(results, "\n")

sparql = AMPhion.get_sparql("What are the significant drug interactions of paracetamol?")
print(sparql)

['imatinib', 'phenytoin', 'acenocoumarol', 'propranolol', 'isoniazid', 'rifampicin', 'carbamazepin', 'rifabutin', 'rac-warfarin'] 

SELECT DISTINCT ?item ?itemLabel 
WHERE {
wd:Q57055 wdt:P769 ?item .
?item rdfs:label ?itemLabel . 
FILTER( LANG (?itemLabel) = 'en')
}


In [17]:
sparql = AMPhion.get_sparql("What are the significant drug interactions of Tylenol's active ingredient?")
print(sparql)

SELECT DISTINCT ?item ?itemLabel 
WHERE {
wd:Q47521665 wdt:P3781 ?item .
?item wdt:P769 wd:Q70551253 .
?item rdfs:label ?itemLabel . 
FILTER( LANG (?itemLabel) = 'en')
}
