## Build an Open-Domain Question-Answering System With BERT and `ktrain`

We first install `ktrain` and load a dataset into a Python list. We use the [20 Newsgroups dataset](https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html) in this example.

In [31]:
!pip3 install -q ktrain

In [None]:
# load 20newsgroups datset into an array
from sklearn.datasets import fetch_20newsgroups
remove = ('headers', 'footers', 'quotes')
newsgroups_train = fetch_20newsgroups(subset='train', remove=remove)
newsgroups_test = fetch_20newsgroups(subset='test', remove=remove)
docs = newsgroups_train.data +  newsgroups_test.data

Next, we will import `ktrain` modules and set the location of the search index.

In [33]:
import ktrain
from ktrain import text

In [34]:
INDEXDIR = '/tmp/myindex'

### STEP 1: Create a Search Index

In [None]:
text.SimpleQA.initialize_index(INDEXDIR)
text.SimpleQA.index_from_list(docs, INDEXDIR, commit_every=len(docs))

### STEP 2: Create a QA Instance

In [27]:
qa = text.SimpleQA(INDEXDIR)

### Ask Questions!

##### Space Question

In [28]:
answers = qa.ask('When did the Cassini probe launch?')
qa.display_answers(answers[:5])

Unnamed: 0,Candidate Answer,Context,Confidence,Document Reference
0,in october of 1997,cassini is scheduled for launch aboard a titan iv / centaur in october of 1997 .,0.819034,59
1,"on january 26,1962","ranger 3, launched on january 26,1962 , was intended to land an instrument capsule on the surface of the moon, but problems during the launch caused the probe to miss the moon and head into solar orbit.",0.151228,8525
2,- 10 / 06 / 97,key scheduled dates for the cassini mission (vvejga trajectory)-------------------------------------------------------------10 / 06 / 97-titan iv / centaur launch 04 / 21 / 98-venus 1 gravity assist 06 / 20 / 99-venus 2 gravity assist 08 / 16 / 99-earth gravity assist 12 / 30 / 00-jupiter gravity assist 06 / 25 / 04-saturn arrival 01 / 09 / 05-titan probe release 01 / 30 / 05-titan probe entry 06 / 25 / 08-end of primary mission (schedule last updated 7 / 22 / 92) - 10 / 06 / 97,0.029694,59
3,* 98,"cassini * * * * * * * * * * * * * * * * * * 98 ,115 * * * *",2.6e-05,5356
4,the latter part of the 1990s,"scheduled for launch in the latter part of the 1990s , the craf and cassini missions are a collaborative project of nasa, the european space agency and the federal space agencies of germany and italy, as well as the united states air force and the department of energy.",1.7e-05,18684


As shown above, the top candidate answer of **October 1997** is the correct one.  (This won't always be the case, but it is here.)

##### Technical Support Question

In [29]:
answers = qa.ask('What causes computer images to be too dark?')
qa.display_answers(answers[:5])

Unnamed: 0,Candidate Answer,Context,Confidence,Document Reference
0,if your viewer does not do gamma correction,"if your viewer does not do gamma correction , then linear images will look too dark, and gamma corrected images will ok.",0.93799,13873
1,is gamma correction,"this, is gamma correction (or the lack of it).",0.045166,13873
2,so if you just dump your nice linear image out to a crt,"so if you just dump your nice linear image out to a crt , the image will look much too dark.",0.010337,13873
3,that small color details,"the algorithm achieves much of its compression by exploiting known limitations of the human eye, notably the fact that small color details are not perceived as well as small details of light and dark.",0.002114,6987
4,that small color details,"the algorithm achieves much of its compression by exploiting known limitations of the human eye, notably the fact that small color details are not perceived as well as small details of light and dark.",0.002114,12344


It looks like a **lack of gamma correction** is a cause of this technical problem.

##### Religious Question

In [36]:
answers = qa.ask('Who was Mohammed Prophet?')
qa.display_answers(answers[:5])

Unnamed: 0,Candidate Answer,Context,Confidence,Document Reference
0,anwar mohammed,"unfortunately not all think like this, we have cases like : anas omran, hamza saleh, jle, mohammed reza, mehmed abu abed, anwar mohammed and others who think that jihad is the only solution.",0.834524,18764
1,prophet isaiah,"17 ] this was to fulfil what was spoken by the prophet isaiah , "" he took our infirmities and bore our diseases.",0.094471,913
2,accept brigham young,""" the rest "" were apostates and excommunicated members of the church, while the great majority of the membership, the twelve, and the various auxiliary organizations, chose to accept brigham young as the new prophet and leader of the church.",0.027686,7242
3,""" rushdie","[ this was in response to the claim that "" rushdie made false statements about the life of mohammed "", with the disclaimer "" (fiction, i know, but where is the line between fact and fiction ?)-i stand by this distinction between fiction and "" false statements "" ]",0.021582,8475
4,barnabas,"barnabas was a prophet, acts says, before he was even sent out as an apostle.",0.016268,8118


Here, we see different views on Mohammed Prophet buried within this dataset.