<a href="https://colab.research.google.com/github/zseebrz/colab/blob/main/DbPedia_QA_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**DBPedia Question Answering System**
Copyright 2021 Mark Watson. All rights reserved. License: Apache 2
https://markwatson.com

In [None]:
!pip install transformers
!pip install SPARQLWrapper
!pip freeze

from transformers import pipeline

qa = pipeline(
    "question-answering",
    #model="NeuML/bert-small-cord19qa",
    model="NeuML/bert-small-cord19-squad2",
    tokenizer="NeuML/bert-small-cord19qa"
)

In [None]:
!pip install import spacy
!python -m spacy download en_core_web_sm

import spacy

nlp_model = spacy.load('en_core_web_sm')


In [None]:
from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://dbpedia.org/sparql")

def query(query):
  sparql.setQuery(query)
  sparql.setReturnFormat(JSON)
  return sparql.query().convert()['results']['bindings']


In [None]:
def entities_in_text(s):
    doc = nlp_model(s)
    ret = {}
    for [ename, etype] in [[entity.text, entity.label_] for entity in doc.ents]:
        if etype in ret:
            ret[etype] = ret[etype] + [ename]
        else:
            ret[etype] = [ename]
    return ret

def dbpedia_get_entities_by_name(name, dbpedia_type):
  sparql = "select distinct ?s ?comment where {{ ?s <http://www.w3.org/2000/01/rdf-schema#label>  \"{}\"@en . ?s <http://www.w3.org/2000/01/rdf-schema#comment>  ?comment  . FILTER  (lang(?comment) = 'en') . ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> {} . }} limit 15".format(name, dbpedia_type)
  #print(sparql)
  results = query(sparql)
  return(results)

entity_type_to_type_uri = {'PERSON': '<http://dbpedia.org/ontology/Person>',
    'GPE': '<http://dbpedia.org/ontology/Place>', 'ORG':
    '<http://dbpedia.org/ontology/Organisation>'}


In [None]:
def QA(query_text):
  entities = entities_in_text(query_text)

  def helper(entity_type):
    ret = ""
    if entity_type in entities:
      for hname in entities[entity_type]:
        results = dbpedia_get_entities_by_name(hname, entity_type_to_type_uri[entity_type])
        for result in results:
          ret += ret + result['comment']['value'] + " . "
    return ret

  context_text = helper('PERSON') + helper('ORG') + helper('GPE')
  print("\ncontext text:\n", context_text, "\n")

  print("Answer from transformer model:")
  print("Original query: ", query_text)
  print("Answer:")

  answer = qa({
                "question": query_text,
                "context": context_text
               })
  print(answer)


In [None]:

QA("where does Bill Gates work?")
QA("where is IBM is headquartered?")
QA("who is Bill Clinton married to?")
QA("what is the population of Paris?")



context text:
 William Henry Gates III (born October 28, 1955) is an American business magnate, software developer, investor, author, and philanthropist. He is a co-founder of Microsoft Corporation, along with his late childhood friend Paul Allen. During his career at Microsoft, Gates held the positions of chairman, chief executive officer (CEO), president and chief software architect, while also being the largest individual shareholder until May 2014. He is considered one of the best known entrepreneurs of the microcomputer revolution of the 1970s and 1980s. .  

Answer from transformer model:
Original query:  where does Bill Gates work?
Answer:
{'score': 0.23351848125457764, 'start': 254, 'end': 263, 'answer': 'Microsoft'}

context text:
 International Business Machines Corporation (IBM) is an American multinational technology company headquartered in Armonk, New York, with operations in over 170 countries. The company began in 1911, founded in Endicott, New York, as the Computing-T