# Jack the Reader: A Machine Reading Framework

## Prerequisites

**Note:** these commands need to be run in terminal from the root of Jack.
```
sh data/GloVe/download.sh
wget -O fastqa.zip https://www.dropbox.com/s/lftgh01zi60r9jv/fastqa.zip?dl=1
wget -O dam.zip https://www.dropbox.com/s/vnsd5cfg3i3bv8f/dam.zip?dl=1
wget -O esim_mnli.zip http://data.neuralnoise.com/jack/natural_language_inference/esim_mnli.zip
unzip fastqa.zip
unzip dam.zip
unzip esim_mnli.zip
```

In [2]:
%load_ext autoreload
%autoreload 2
import os
os.chdir('..')    # change dir to Jack root
from jack import readers
from jack.core import QASetting
from jack.io.load import load_jack
from notebooks.prettyprint import QAPrettyPrint, print_nli

  from ._conv import register_converters as _register_converters


## Question Answering (QA)

Loading ready-to-use pretrained extractive QA models:

In [3]:
qa_reader = readers.reader_from_file("./fastqa")
#bidaf_reader = readers.reader_from_file("./bidaf") # XXX TODO

Instructions for updating:
Use the retry module or similar alternatives.
INFO:tensorflow:Restoring parameters from ./fastqa/model_module


##### Example 1

In [4]:
paragraph = """It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. 
At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary."""
question = "To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?"

##### Example 2

In [5]:
paragraph = """Tom was visiting his family in the Rocky Mountains with his two little dogs. They had a welcome BBQ and ate all of the chocolates he brought. Tom has a friend whose family lives in Vermont."""
question = "Where do Tom's parents live?"
#question = "What did the family eat for desert?"
#question = "How many dogs does Tom have?"

##### Example 3

In [22]:
paragraph = "France lost to Croatia 0 : 5 in yesterday's World Cup final. "
question = "What was the final score of the World Cup final match?"

##### Input Format: the `QASetting` data structure: 

In [23]:
qa_setting = QASetting(question=question, support=[paragraph])

##### Calling the Reader:

In [24]:
answers = qa_reader([qa_setting])
print(question)
QAPrettyPrint(paragraph, answers[0][0].span)

What was the final score of the World Cup final match?


##### Top-k Predictions with, Score:

In [25]:
qa_reader.model_module.set_topk(k=5)
answers = qa_reader([qa_setting])
for i, a in enumerate(answers[0]):
    print("Top %d Answer score: %.5f \t %s" % (i+1, a.score, a.text))

Top 1 Answer score: 0.47482 	 0 : 5
Top 2 Answer score: 0.47482 	 0 : 5
Top 3 Answer score: 0.26900 	 Croatia 0 : 5
Top 4 Answer score: 0.13176 	 France lost to Croatia 0 : 5
Top 5 Answer score: 0.07239 	 5


## Natural Language Inference (NLI)

Loading ready-to-use pretrained models:

In [18]:
dam_reader = readers.reader_from_file("./dam")
#esim_reader = readers.reader_from_file("./esim_mnli") # XXX TODO

INFO:tensorflow:Restoring parameters from ./dam/model_module


NotFoundError: Key dam_snli_reader/bos_token_embedding not found in checkpoint
	 [[Node: save_1/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2/tensor_names, save_1/RestoreV2/shape_and_slices)]]

Caused by op 'save_1/RestoreV2', defined at:
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 486, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 112, in start
    self.asyncio_loop.run_forever()
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
    handle._run()
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/usr/local/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 102, in _handle_events
    handler_func(fileobj, events)
  File "/usr/local/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/tornado/stack_context.py", line 276, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-18-0f70ecc42649>", line 1, in <module>
    dam_reader = readers.reader_from_file("./dam")
  File "/Users/Johannes/PhD/Conferences/ACL2018/Jack_Demo/jack/jack/readers/implementations.py", line 64, in reader_from_file
    reader.load_and_setup_modules(load_dir)
  File "/Users/Johannes/PhD/Conferences/ACL2018/Jack_Demo/jack/jack/core/reader.py", line 170, in load_and_setup_modules
    self.model_module.setup(is_training)
  File "/Users/Johannes/PhD/Conferences/ACL2018/Jack_Demo/jack/jack/core/tensorflow.py", line 118, in setup
    self._saver = tf.train.Saver(self._training_variables, max_to_keep=1)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1311, in __init__
    self.build()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1320, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1357, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 809, in _build_internal
    restore_sequentially, reshape)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 448, in _AddRestoreOps
    restore_sequentially)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 860, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1458, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key dam_snli_reader/bos_token_embedding not found in checkpoint
	 [[Node: save_1/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save_1/Const_0_0, save_1/RestoreV2/tensor_names, save_1/RestoreV2/shape_and_slices)]]


and next some Natural Language Inference examples from the SNLI corpus [[5]](#ref5):

In [None]:
premise1 = "A wedding party is taking pictures."
hypothesis1 = "A group of people is celebrating."
hypothesis2 = "A rock band is giving a concert."

premise = "Tom was visiting his family in the Rocky Mountains."
hypothesis1 = "Tom's family lives in the Rocky Mountains."
hypothesis2 = "Tom's family lives in Vermont."


In the NLI case, the answer is a label among {_"entailment"_, _"neutral"_, _"contradiction"_}.

We can again use the same `QASetting` input data structure for entailment data:

In [None]:
snli_setting1 = QASetting(question=hypothesis1, support=[premise])
snli_setting2 = QASetting(question=hypothesis2, support=[premise])

We generate predictions by calling the reader with these inputs:

In [None]:
prediction = dam_reader([snli_setting1])
print_nli(premise, hypothesis1, prediction[0][0].text)

prediction = dam_reader([snli_setting2])
print_nli(premise, hypothesis2, prediction[0][0].text)

...and we can again also inspect prediction scores:

In [None]:
print(prediction[0][0].score)

## Usecase: Knowledge Base Link Prediction 

We load a pretrained DistMult, ConvE and COMPLEX model:

In [None]:
distmult_reader = readers.reader_from_file("./distmult") # XXX TODO
conve_reader = readers.reader_from_file("./conve") # XXX TODO