<a href="https://colab.research.google.com/github/thousandoaks/DigitalRents/blob/main/Watson_QA_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# WATSON: All you ever wanted to know about BigTechs

<img style="float: right;" src="https://www.ucl.ac.uk/bartlett/public-purpose/sites/bartlett_public_purpose/files/styles/small_image/public/brochure-cover-1019_0.png?itok=kYkCqJq4">

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/thousandoaks/DigitalRents/blob/main/Watson_QA_Pipeline.ipynb)

Watson can be used in a variety of use cases. A very common one:  Using it to navigate through complex knowledge bases regarding Bigtechs.




### Prepare environment

#### Colab: Enable the GPU runtime
Make sure you enable the GPU runtime to experience decent speed in this tutorial.
**Runtime -> Change Runtime type -> Hardware accelerator -> GPU**

<img src="https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/_src/img/colab_gpu_runtime.jpg">

In [1]:
# Make sure you have a GPU running
!nvidia-smi

Wed Sep  8 08:48:32 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P8    28W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
# Install the latest release of Haystack in your own environment 
! pip install farm-haystack

# Install the latest master of Haystack
#!pip install grpcio-tools==1.34.1
#!pip install git+https://github.com/deepset-ai/haystack.git


Collecting farm-haystack
  Downloading farm_haystack-0.9.0-py3-none-any.whl (180 kB)
[?25l[K     |█▉                              | 10 kB 19.6 MB/s eta 0:00:01[K     |███▋                            | 20 kB 23.5 MB/s eta 0:00:01[K     |█████▌                          | 30 kB 12.5 MB/s eta 0:00:01[K     |███████▎                        | 40 kB 9.6 MB/s eta 0:00:01[K     |█████████                       | 51 kB 5.5 MB/s eta 0:00:01[K     |███████████                     | 61 kB 6.0 MB/s eta 0:00:01[K     |████████████▊                   | 71 kB 5.8 MB/s eta 0:00:01[K     |██████████████▌                 | 81 kB 6.5 MB/s eta 0:00:01[K     |████████████████▍               | 92 kB 4.9 MB/s eta 0:00:01[K     |██████████████████▏             | 102 kB 5.3 MB/s eta 0:00:01[K     |████████████████████            | 112 kB 5.3 MB/s eta 0:00:01[K     |█████████████████████▉          | 122 kB 5.3 MB/s eta 0:00:01[K     |███████████████████████▋        | 133 kB 5.3 MB/s eta

In [4]:
from haystack.preprocessor.cleaning import clean_wiki_text
from haystack.preprocessor.utils import convert_files_to_dicts, fetch_archive_from_http
from haystack.reader.farm import FARMReader
from haystack.reader.transformers import TransformersReader
from haystack.utils import print_answers

## Document Store



### Start an Elasticsearch server
You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in your environment (eg., in Colab notebooks), then you can manually download and execute Elasticsearch from source.

In [23]:
# Recommended: Start Elasticsearch using Docker via the Haystack utility function
from haystack.utils import launch_es

launch_es()

09/08/2021 08:59:40 - INFO - haystack.utils -   Starting Elasticsearch ...


In [24]:
# In Colab / No Docker environments: Start Elasticsearch from source
! wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
! tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
! chown -R daemon:daemon elasticsearch-7.9.2

import os
from subprocess import Popen, PIPE, STDOUT
es_server = Popen(['elasticsearch-7.9.2/bin/elasticsearch'],
                   stdout=PIPE, stderr=STDOUT,
                   preexec_fn=lambda: os.setuid(1)  # as daemon
                  )
# wait until ES has started
! sleep 30

In [25]:
# Connect to Elasticsearch

from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")

09/08/2021 09:00:32 - INFO - elasticsearch -   HEAD http://localhost:9200/ [status:200 request:0.088s]
09/08/2021 09:00:32 - INFO - elasticsearch -   HEAD http://localhost:9200/document [status:200 request:0.017s]
09/08/2021 09:00:32 - INFO - elasticsearch -   GET http://localhost:9200/document [status:200 request:0.006s]
09/08/2021 09:00:32 - INFO - elasticsearch -   PUT http://localhost:9200/document/_mapping [status:200 request:0.033s]
09/08/2021 09:00:32 - INFO - elasticsearch -   HEAD http://localhost:9200/label [status:200 request:0.004s]


## Preprocessing of documents

Haystack provides a customizable pipeline for:
 - converting files into texts
 - cleaning texts
 - splitting texts
 - writing them to a Document Store

In this tutorial, we download  articles about BigTechs

In [43]:
# Let's first fetch some documents that we want to query
# Here: 517 Wikipedia articles for Game of Thrones
doc_dir = "data/article_txt_watson"
s3_url = "https://digitalrents.s3.eu-west-2.amazonaws.com/documentstoindex2.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

# Convert files to dicts
# You can optionally supply a cleaning function that is applied to each doc (e.g. to remove footers)
# It must take a str as input, and return a str.
dicts = convert_files_to_dicts(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)

# We now have a list of dictionaries that we can write to our document store.
# If your texts come from a different source (e.g. a DB), you can of course skip convert_files_to_dicts() and create the dictionaries yourself.
# The default format here is:
# {
#    'text': "<DOCUMENT_TEXT_HERE>",
#    'meta': {'name': "<DOCUMENT_NAME_HERE>", ...}
#}
# (Optionally: you can also add more key-value-pairs here, that will be indexed as fields in Elasticsearch and
# can be accessed later for filtering or shown in the responses of the Pipeline)

# Let's have a look at the first 3 entries:
#print(dicts[:3])

# Now, let's write the dicts containing documents to our DB.
document_store.write_documents(dicts)

09/08/2021 09:31:37 - INFO - haystack.preprocessor.utils -   Found data stored in `data/article_txt_watson`. Delete this first if you really want to fetch new data.
09/08/2021 09:31:37 - INFO - haystack.preprocessor.utils -   Converting data/article_txt_watson/documentstoindex2/antitrustcaseagainstFacebook.txt
09/08/2021 09:31:37 - INFO - haystack.preprocessor.utils -   Converting data/article_txt_watson/documentstoindex2/moreappleproduct.txt
09/08/2021 09:31:37 - INFO - haystack.preprocessor.utils -   Converting data/article_txt_watson/documentstoindex2/corporatepower.txt
09/08/2021 09:31:37 - INFO - haystack.preprocessor.utils -   Converting data/article_txt_watson/documentstoindex2/applesecretmonopoly.txt
09/08/2021 09:31:37 - INFO - haystack.preprocessor.utils -   Converting data/article_txt_watson/documentstoindex2/usingantitrustlaw.txt
09/08/2021 09:31:37 - INFO - haystack.preprocessor.utils -   Converting data/article_txt_watson/documentstoindex2/revivingatitrust.txt
09/08/2021 

## Initalize Retriever, Reader,  & Pipeline

### Retriever

Retrievers help narrowing down the scope for the Reader to smaller units of text where a given question could be answered.
They use some simple but fast algorithm.

**Here:** We use Elasticsearch's default BM25 algorithm



In [26]:
from haystack.retriever.sparse import ElasticsearchRetriever
retriever = ElasticsearchRetriever(document_store=document_store)

In [None]:
# Alternative: An in-memory TfidfRetriever based on Pandas dataframes for building quick-prototypes with SQLite document store.

# from haystack.retriever.sparse import TfidfRetriever
# retriever = TfidfRetriever(document_store=document_store)

### Reader

A Reader scans the texts returned by retrievers in detail and extracts the k best answers. They are based
on powerful, but slower deep learning models.


In [10]:
# Load a  local model or any of the QA models on
# Hugging Face's model hub (https://huggingface.co/models)

reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

09/08/2021 08:55:20 - INFO - farm.utils -   Using device: CUDA 
09/08/2021 08:55:20 - INFO - farm.utils -   Number of GPUs: 1
09/08/2021 08:55:20 - INFO - farm.utils -   Distributed Training: False
09/08/2021 08:55:20 - INFO - farm.utils -   Automatic Mixed Precision: None
09/08/2021 08:55:20 - INFO - filelock -   Lock 140553581156944 acquired on /root/.cache/huggingface/transformers/c40d0abb589629c48763f271020d0b1f602f5208c432c0874d420491ed37e28b.122ed338b3591c07dba452777c59ff52330edb340d3d56d67aa9117ad9905673.lock


Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

09/08/2021 08:55:20 - INFO - filelock -   Lock 140553581156944 released on /root/.cache/huggingface/transformers/c40d0abb589629c48763f271020d0b1f602f5208c432c0874d420491ed37e28b.122ed338b3591c07dba452777c59ff52330edb340d3d56d67aa9117ad9905673.lock
09/08/2021 08:55:21 - INFO - filelock -   Lock 140553594424528 acquired on /root/.cache/huggingface/transformers/eac3273a8097dda671e3bea1db32c616e74f36a306c65b4858171c98d6db83e9.084aa7284f3a51fa1c8f0641aa04c47d366fbd18711f29d0a995693cfdbc9c9e.lock


Downloading:   0%|          | 0.00/496M [00:00<?, ?B/s]

09/08/2021 08:55:38 - INFO - filelock -   Lock 140553594424528 released on /root/.cache/huggingface/transformers/eac3273a8097dda671e3bea1db32c616e74f36a306c65b4858171c98d6db83e9.084aa7284f3a51fa1c8f0641aa04c47d366fbd18711f29d0a995693cfdbc9c9e.lock
Some weights of the model checkpoint at deepset/roberta-base-squad2 were not used when initializing RobertaModel: ['qa_outputs.bias', 'qa_outputs.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at deepset/roberta-base-squad2 and are newly initialized: ['ro

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

09/08/2021 08:55:46 - INFO - filelock -   Lock 140553548121616 released on /root/.cache/huggingface/transformers/81c80edb4c6cefa5cae64ccfdb34b3b309ecaf60da99da7cd1c17e24a5d36eb5.647b4548b6d9ea817e82e7a9231a320231a1c9ea24053cc9e758f3fe68216f05.lock
09/08/2021 08:55:46 - INFO - filelock -   Lock 140553547320272 acquired on /root/.cache/huggingface/transformers/b87d46371731376b11768b7839b1a5938a4f77d6bd2d9b683f167df0026af432.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b.lock


Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

09/08/2021 08:55:46 - INFO - filelock -   Lock 140553547320272 released on /root/.cache/huggingface/transformers/b87d46371731376b11768b7839b1a5938a4f77d6bd2d9b683f167df0026af432.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b.lock
09/08/2021 08:55:47 - INFO - filelock -   Lock 140553548121360 acquired on /root/.cache/huggingface/transformers/c9d2c178fac8d40234baa1833a3b1903d393729bf93ea34da247c07db24900d0.cb2244924ab24d706b02fd7fcedaea4531566537687a539ebb94db511fd122a0.lock


Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

09/08/2021 08:55:47 - INFO - filelock -   Lock 140553548121360 released on /root/.cache/huggingface/transformers/c9d2c178fac8d40234baa1833a3b1903d393729bf93ea34da247c07db24900d0.cb2244924ab24d706b02fd7fcedaea4531566537687a539ebb94db511fd122a0.lock
09/08/2021 08:55:47 - INFO - filelock -   Lock 140553537772496 acquired on /root/.cache/huggingface/transformers/e8a600814b69e3ee74bb4a7398cc6fef9812475010f16a6c9f151b2c2772b089.451739a2f3b82c3375da0dfc6af295bedc4567373b171f514dd09a4cc4b31513.lock


Downloading:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

09/08/2021 08:55:47 - INFO - filelock -   Lock 140553537772496 released on /root/.cache/huggingface/transformers/e8a600814b69e3ee74bb4a7398cc6fef9812475010f16a6c9f151b2c2772b089.451739a2f3b82c3375da0dfc6af295bedc4567373b171f514dd09a4cc4b31513.lock
09/08/2021 08:55:47 - INFO - farm.utils -   Using device: CUDA 
09/08/2021 08:55:47 - INFO - farm.utils -   Number of GPUs: 1
09/08/2021 08:55:47 - INFO - farm.utils -   Distributed Training: False
09/08/2021 08:55:47 - INFO - farm.utils -   Automatic Mixed Precision: None
09/08/2021 08:55:47 - INFO - farm.infer -   Got ya 2 parallel workers to do inference ...
09/08/2021 08:55:47 - INFO - farm.infer -    0    0 
09/08/2021 08:55:47 - INFO - farm.infer -   /w\  /w\
09/08/2021 08:55:47 - INFO - farm.infer -   /'\  / \
09/08/2021 08:55:47 - INFO - farm.infer -     


#### TransformersReader

In [None]:
# Alternative:
# reader = TransformersReader(model_name_or_path="distilbert-base-uncased-distilled-squad", tokenizer="distilbert-base-uncased", use_gpu=-1)

### Pipeline

With a Haystack `Pipeline` you can stick together your building blocks to a search pipeline.
Under the hood, `Pipelines` are Directed Acyclic Graphs (DAGs) that you can easily customize for your own use cases.
To speed things up, Haystack also comes with a few predefined Pipelines. One of them is the `ExtractiveQAPipeline` that combines a retriever and a reader to answer our questions.
You can learn more about `Pipelines` in the [docs](https://haystack.deepset.ai/docs/latest/pipelinesmd).

In [27]:
from haystack.pipeline import ExtractiveQAPipeline
pipe = ExtractiveQAPipeline(reader, retriever)

## Voilà! Ask a question!

In [12]:
# You can configure how many candidates the reader and retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers. 
prediction = pipe.run(query="is Google a monopolist?", top_k_retriever=10, top_k_reader=5)

09/08/2021 08:55:48 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.203s]
Inferencing Samples: 100%|██████████| 16/16 [00:22<00:00,  1.40s/ Batches]
Inferencing Samples: 100%|██████████| 9/9 [00:11<00:00,  1.33s/ Batches]
Inferencing Samples: 100%|██████████| 3/3 [00:03<00:00,  1.01s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.33s/ Batches]
Inferencing Samples: 100%|██████████| 2/2 [00:01<00:00,  1.23 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.21 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.51 Batches/s]


In [13]:
print_answers(prediction, details="minimal")

[   {   'answer': 'Google is a dominant internet search company',
        'context': 'tion in search. 1. Google Search/Google Verticals. — '
                   'Google is a dominant internet search company, capturing '
                   'around 88% of the U.S. search engine mark'},
    {   'answer': 'No firm with a market share of less than 50% is a '
                  'monopolist',
        'context': 'tic share of the relevant app market.314 314 No firm with '
                   'a market share of less than 50% is a monopolist. Compare '
                   'United States v. Aluminum Co. of Am'},
    {   'answer': 'supracompetitive',
        'context': 'JKR8] (last visited Apr. 4, 2019)....margins that would '
                   'qualify as supracompetitive and that derive from a market '
                   'that Google dominates. Since 2004, A'},
    {   'answer': 'the search market',
        'context': 'er market—for Facebook, the social network market, and for '
                   

In [37]:
# You can configure how many candidates the reader and retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers. 
prediction = pipe.run(query="is Facebook a monopolist?", top_k_retriever=10, top_k_reader=5)

09/08/2021 09:23:23 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.024s]
Inferencing Samples: 100%|██████████| 16/16 [00:22<00:00,  1.40s/ Batches]
Inferencing Samples: 100%|██████████| 9/9 [00:11<00:00,  1.33s/ Batches]
Inferencing Samples: 100%|██████████| 3/3 [00:03<00:00,  1.01s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.35s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.07 Batches/s]
Inferencing Samples: 100%|██████████| 2/2 [00:01<00:00,  1.25 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.21 Batches/s]


In [38]:
print_answers(prediction, details="minimal")

[   {   'answer': 'Facebook is a monopoly',
        'context': 'eciating and correcting for the foreclosure of consumer '
                   'choice. Facebook is a monopoly that tipped the early '
                   'market with promises of data privacy and '},
    {   'answer': 'Facebook is a monopoly',
        'context': 'media was enabled by a policy of deceiving users about '
                   'privacy:\n'
                   'Facebook is a monopoly that tipped the early market with '
                   'promises of data privacy and '},
    {   'answer': 'Facebook may be a monopoly',
        'context': 'htened Scrutiny in Markets with Direct Network Effects '
                   'Though Facebook may be a monopoly, antitrust law, and the '
                   'Sherman Act specifically, only condem'},
    {   'answer': 'Facebook is a monopolist',
        'context': 'lity inasmuch as they do price. As I will argue in this '
                   'Paper, Facebook is a monopolist, and what F

In [29]:
# You can configure how many candidates the reader and retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers. 
prediction = pipe.run(query="Who is Lina Khan", top_k_retriever=5, top_k_reader=5)
print_answers(prediction, details="minimal")

09/08/2021 09:01:40 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.040s]
Inferencing Samples: 100%|██████████| 2/2 [00:01<00:00,  1.13 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.35s/ Batches]
Inferencing Samples: 100%|██████████| 16/16 [00:22<00:00,  1.40s/ Batches]
Inferencing Samples: 100%|██████████| 9/9 [00:12<00:00,  1.34s/ Batches]
Inferencing Samples: 100%|██████████| 3/3 [00:03<00:00,  1.02s/ Batches]

[   {   'answer': 'wunderkind lawyer',
        'context': 'l. Sitting just behind Cicilline at the hearing was Lina '
                   'Khan, the wunderkind lawyer who has reshaped the national '
                   'conversation about tech and antitru'},
    {   'answer': 'technology companies',
        'context': 'E: WHAT ARE THEY?\n'
                   'Facebook and Google are commonly understood as technology '
                   'companies, but they are more accurately described as '
                   'communications networ'},
    {   'answer': 'Amazon',
        'context': 'ly through examining the structure of markets.255 255 See '
                   'Lina M. Khan, Amazon’s Antitrust Paradox, 126 Yale L.J. '
                   '710, 717– 22 (2017) [hereinafter Kha'},
    {   'answer': 'the face of a new, more aggressive school of thought on '
                  'antitrust policy',
        'context': 'ifically to help lead this inquiry. As the face of a new, '
                   




In [30]:
# You can configure how many candidates the reader and retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers. 
prediction = pipe.run(query="Apple abuse of power", top_k_retriever=4, top_k_reader=5)
print_answers(prediction, details="minimal")

09/08/2021 09:02:21 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.040s]
Inferencing Samples: 100%|██████████| 2/2 [00:01<00:00,  1.22 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.34s/ Batches]
Inferencing Samples: 100%|██████████| 16/16 [00:22<00:00,  1.41s/ Batches]
Inferencing Samples: 100%|██████████| 3/3 [00:03<00:00,  1.02s/ Batches]

[   {   'answer': 'Apple Music',
        'context': 'ould market through Apple, as Apple ramped up its '
                   'competitor service, Apple Music.177177 Id. This is not the '
                   'first time that developers have alleged d'},
    {   'answer': 'technology companies',
        'context': 'E: WHAT ARE THEY?\n'
                   'Facebook and Google are commonly understood as technology '
                   'companies, but they are more accurately described as '
                   'communications networ'},
    {   'answer': 'BlueMail',
        'context': ' after the feature was announced in June 2019, Blix says, '
                   'Apple kicked BlueMail off of its Mac App Store. Apple '
                   'maintains that was for security reason'},
    {   'answer': '3.9%)',
        'context': 'ompetitor, eBay, enjoys 6.6% of the ecommerce market, '
                   'followed by Apple (3.9%) and Walmart (3.7%). Lunden, '
                   'Amazon’s Share of th




In [31]:
# You can configure how many candidates the reader and retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers. 
prediction = pipe.run(query="is Apple a monopolist ?", top_k_retriever=4, top_k_reader=5)
print_answers(prediction, details="minimal")

09/08/2021 09:02:51 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.023s]
Inferencing Samples: 100%|██████████| 16/16 [00:22<00:00,  1.42s/ Batches]
Inferencing Samples: 100%|██████████| 9/9 [00:12<00:00,  1.34s/ Batches]
Inferencing Samples: 100%|██████████| 2/2 [00:01<00:00,  1.24 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.35s/ Batches]

[   {   'answer': 'Apple isn’t a monopoly',
        'context': 'ber, Cook seemed to be hedging his bets. While maintaining '
                   'that Apple isn’t a monopoly, he mused in an interview that '
                   '“a monopoly by itself isn’t bad '},
    {   'answer': 'Apple, they say, may not look like a classic monopoly',
        'context': 'make the case for regulating the company anyway. Apple, '
                   'they say, may not look like a classic monopoly. But when '
                   'it comes to how it runs the iOS App S'},
    {   'answer': 'Whether Apple can be a monopoly with a market share of less '
                  'than 50% is thus an open question',
        'context': 'er to be found guilty of it. Whether Apple can be a '
                   'monopoly with a market share of less than 50% is thus an '
                   'open question. “Monopoly is not a binary,'},
    {   'answer': 'Apple’s total control of the App Store',
        'context': 'd compromis




In [39]:
# You can configure how many candidates the reader and retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers. 
prediction = pipe.run(query="apple business model", top_k_retriever=4, top_k_reader=5)
print_answers(prediction, details="minimal")

09/08/2021 09:24:10 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.023s]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.50s/ Batches]
Inferencing Samples: 100%|██████████| 16/16 [00:22<00:00,  1.40s/ Batches]
Inferencing Samples: 100%|██████████| 2/2 [00:01<00:00,  1.24 Batches/s]
Inferencing Samples: 100%|██████████| 3/3 [00:03<00:00,  1.02s/ Batches]

[   {   'answer': 'technology companies, but they are more accurately '
                  'described as communications networks that sell digital '
                  'advertising',
        'context': 'ly understood as technology companies, but they are more '
                   'accurately described as communications networks that sell '
                   'digital advertising. Advertising ma'},
    {   'answer': 'technology companies',
        'context': 'E: WHAT ARE THEY?\n'
                   'Facebook and Google are commonly understood as technology '
                   'companies, but they are more accurately described as '
                   'communications networ'},
    {   'answer': 'customer lock-in',
        'context': 'y iOS features and apps. In reality, Apple has built its '
                   'empire on customer lock-in: making its own gadgets and '
                   'services work seamlessly with one anot'},
    {   'answer': '3.9%)',
        'context': 'ompetitor, eBay




In [40]:
# You can configure how many candidates the reader and retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers. 
prediction = pipe.run(query="Privacy and competition", top_k_retriever=10, top_k_reader=5)
print_answers(prediction, details="minimal")

09/08/2021 09:26:21 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.023s]
Inferencing Samples: 100%|██████████| 9/9 [00:12<00:00,  1.36s/ Batches]
Inferencing Samples: 100%|██████████| 3/3 [00:03<00:00,  1.01s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.34s/ Batches]
Inferencing Samples: 100%|██████████| 2/2 [00:01<00:00,  1.24 Batches/s]
Inferencing Samples: 100%|██████████| 16/16 [00:22<00:00,  1.40s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.31 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.65 Batches/s]

[   {   'answer': 'technology companies',
        'context': 'E: WHAT ARE THEY?\n'
                   'Facebook and Google are commonly understood as technology '
                   'companies, but they are more accurately described as '
                   'communications networ'},
    {   'answer': 'Apple Music',
        'context': 'ould market through Apple, as Apple ramped up its '
                   'competitor service, Apple Music.177177 Id. This is not the '
                   'first time that developers have alleged d'},
    {   'answer': 'Apple Maps vs. Google Maps: Which Is Better',
        'context': 'tor a distant second.632 632 See, e.g., The Manifest, '
                   'Apple Maps vs. Google Maps: Which Is Better?, Medium '
                   '(Sept. 12, 2018), https://medium.com/@the_m'},
    {   'answer': 'Teen Compliment',
        'context': 'purchased it.147 147 See Josh Constine, Facebook Acquires '
                   'Anonymous Teen Compliment App tbh, Will Let It




In [42]:
# You can configure how many candidates the reader and retriever shall return
# The higher top_k_retriever, the better (but also the slower) your answers. 
prediction = pipe.run(query="Online commerce", top_k_retriever=10, top_k_reader=5)
print_answers(prediction, details="minimal")

09/08/2021 09:28:18 - INFO - elasticsearch -   POST http://localhost:9200/document/_search [status:200 request:0.021s]
Inferencing Samples: 100%|██████████| 16/16 [00:22<00:00,  1.41s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:01<00:00,  1.33s/ Batches]
Inferencing Samples: 100%|██████████| 2/2 [00:01<00:00,  1.23 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00,  5.39 Batches/s]
Inferencing Samples: 100%|██████████| 9/9 [00:12<00:00,  1.34s/ Batches]
Inferencing Samples: 100%|██████████| 3/3 [00:03<00:00,  1.01s/ Batches]

[   {   'answer': 'technology companies, but they are more accurately '
                  'described as communications networks that sell digital '
                  'advertising',
        'context': 'ly understood as technology companies, but they are more '
                   'accurately described as communications networks that sell '
                   'digital advertising. Advertising ma'},
    {   'answer': 'technology companies',
        'context': 'E: WHAT ARE THEY?\n'
                   'Facebook and Google are commonly understood as technology '
                   'companies, but they are more accurately described as '
                   'communications networ'},
    {   'answer': 'Apple Music',
        'context': 'ould market through Apple, as Apple ramped up its '
                   'competitor service, Apple Music.177177 Id. This is not the '
                   'first time that developers have alleged d'},
    {   'answer': 'this conduct could deter entry and chill innovat




## About us

This  notebook was made with love by [IIPP](https://www.ucl.ac.uk/bartlett/public-purpose/) in London, UK

The UCL Institute for Innovation and Public Purpose (IIPP) is changing how public value is imagined, practised and evaluated to tackle societal challenges. 
  
Some of our other work: 
- [Publications](https://www.ucl.ac.uk/bartlett/public-purpose/publications)
- [Research](https://www.ucl.ac.uk/bartlett/public-purpose/research-0)
- [News](https://www.ucl.ac.uk/bartlett/public-purpose/news-0?collection=drupal-bartlett-news&meta_UclOrgUnit=%22ucl+institute+for+innovation+and+public+purpose%22&)


 
