## Performing inference with the inference API

### Building a query function

In [1]:
import requests

# Defining a query function
def query(payload, model_id, api_token):
    headers = {"Authorization": f"Bearer {api_token}"}
    API_URL = f"https://api-inference.huggingface.co/models/{model_id}"
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

### Single-question requests with the Inference API

In [2]:
# Example of a single-question query
model_id = "deepset/roberta-base-squad2"
data = {"question": "What day is it?",
        "context": "Today is Monday."}
api_token = "api_OqtjUqliPLylbsHwaRvZhbgylcRZroFofO"

print(query(data, model_id, api_token))

{'score': 0.9129266142845154, 'start': 9, 'end': 15, 'answer': 'Monday'}


### Batch inference with the Inference API

In [3]:
# Example of a batch query
data = [{"question": "What day is it?",
         "context": "Today is Monday."},
        {"question": "When was I born?",
         "context": "I was born in Detroit, USA, in 1980."}
       ]

print(query(data, model_id, api_token))

[{'score': 0.9129266142845154, 'start': 9, 'end': 15, 'answer': 'Monday'}, {'score': 0.9501503705978394, 'start': 31, 'end': 35, 'answer': '1980'}]


### Performing invalid requests with the Inference API

In [4]:
# Example of an invalid query
data = [{"question": "What day is it?",
         "text": "Today is Monday."},
        {"question": "When was I born?",
         "text": "I was born in Detroit, USA, in 1980."}
       ]

print(query(data, model_id, api_token))



## Pipeline vs direct model use

### Using transformers pipelines

In [5]:
from transformers import pipeline

# Instantiating a Transformers pipeline
qa_pipeline = pipeline(task="question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


In [6]:
# Performing inference with a pipeline
data = [{"question": "What day is it?",
         "context": "Today is Monday."},
        {"question": "When was I born?",
         "context": "I was born in Detroit, USA, in 1980."}
       ]

result = qa_pipeline(data)

In [7]:
print(result)

[{'score': 0.9574882984161377, 'start': 9, 'end': 15, 'answer': 'Monday'}, {'score': 0.9639834761619568, 'start': 31, 'end': 35, 'answer': '1980'}]


### Using pretrained Transformers models directly

#### Getting started with direct model use

In [8]:
from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering

# Instantiating our tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
model = TFAutoModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2", from_pt=True)

Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFRobertaForQuestionAnswering: ['roberta.embeddings.position_ids']
- This IS expected if you are initializing TFRobertaForQuestionAnswering from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFRobertaForQuestionAnswering from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFRobertaForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForQuestionAnswering for predictions without further training.


#### Tokenizing question-context pairs

In [9]:
# Tokenizing the input question and context as a TF tensor
inputs = tokenizer(text="When was I born?", 
                   text_pair="I was born in Detroit, USA, in 1980.", 
                   add_special_tokens=True, return_tensors="tf")

# Extracting the tokens corresponding to the input sequence
input_ids = inputs["input_ids"].numpy()[0]

In [10]:
# Examining the model inputs
print(inputs)

{'input_ids': <tf.Tensor: shape=(1, 20), dtype=int32, numpy=
array([[   0, 1779,   21,   38, 2421,  116,    2,    2,  100,   21, 2421,
          11, 2921,    6, 2805,    6,   11, 5114,    4,    2]])>, 'attention_mask': <tf.Tensor: shape=(1, 20), dtype=int32, numpy=array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])>}


In [11]:
# Examining input IDs
print(input_ids)

[   0 1779   21   38 2421  116    2    2  100   21 2421   11 2921    6
 2805    6   11 5114    4    2]


In [12]:
# Converting the encoded sequence back into a string
print(tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids)))

<s>When was I born?</s></s>I was born in Detroit, USA, in 1980.</s>


#### Performing single-sample inference with direct model use

In [13]:
# Passing the tokenized inputs to the model
answer = model(inputs)

# Examining the model output
print(answer)

TFQuestionAnsweringModelOutput(loss=None, start_logits=<tf.Tensor: shape=(1, 20), dtype=float32, numpy=
array([[ 1.2259467 , -7.791437  , -8.260778  , -8.249651  , -7.5693817 ,
        -7.936594  , -8.363062  , -7.1329803 , -0.18496554, -2.9790633 ,
        -1.8434091 , -1.3693225 ,  0.71017456, -5.1979237 , -1.2315122 ,
        -4.5270185 ,  0.62987506,  6.598527  , -1.2783077 , -7.465865  ]],
      dtype=float32)>, end_logits=<tf.Tensor: shape=(1, 20), dtype=float32, numpy=
array([[ 1.6835405 , -6.918748  , -7.306207  , -7.5368285 , -6.0764575 ,
        -3.6980824 , -5.6374154 , -4.4059825 , -4.577457  , -5.964948  ,
        -2.6135447 , -5.8583727 , -0.7394024 , -5.670526  ,  0.16733687,
        -2.0341313 , -2.189781  ,  7.048407  ,  3.6379344 , -4.67701   ]],
      dtype=float32)>, hidden_states=None, attentions=None)


In [14]:
# Extracting the start and end logits from the output
answer_start_scores = answer.start_logits
answer_end_scores = answer.end_logits

In [15]:
import tensorflow as tf

# Finding the start and end indices with the highest score
answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0]
answer_end = tf.argmax(answer_end_scores, axis=1).numpy()[0] + 1

In [16]:
print(f"Answer start index: {answer_start}")
print(f"Answer end index: {answer_end}")

Answer start index: 17
Answer end index: 18


In [17]:
# Converting the ID back to a string
final_answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

# Viewing the answer
print(f"Answer to the question: {final_answer}")

Answer to the question:  1980


In [18]:
# Tokenizing the input question and context as a TF tensor
inputs = tokenizer("Where was I born?", 
                   "I was born in Detroit, USA, in 1980.", 
                   add_special_tokens=True, return_tensors="tf")

answer = model(inputs)

# Extracting the tokens corresponding to the input sequence
input_ids = inputs["input_ids"].numpy()[0]

answer_start_scores = answer.start_logits
answer_end_scores = answer.end_logits

# Finding the start and end indices with the highest score
answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0]
answer_end = tf.argmax(answer_end_scores, axis=1).numpy()[0] + 1

print(f"Answer start index: {answer_start}")
print(f"Answer end index: {answer_end}")

# Converting the ID back to a string
final_answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

# Viewing the answer
print(f"Answer to the question: {final_answer}")

Answer start index: 12
Answer end index: 15
Answer to the question:  Detroit, USA


#### Performing batch inference with direct model use

In [19]:
questions = ["Where was I born?", "What day is it?"]
contexts = ["I was born in Detroit, USA, in 1980.", "Today is Monday."]

# Iterating over the questions
for question, context in zip(questions, contexts):
    # Viewing the current question and context
    print(f"Question: {question}")
    print(f"Context: {context}")
    
    # Tokenizing the input question and context as a TF tensor
    inputs = tokenizer(question, context,
                       add_special_tokens=True, return_tensors="tf")

    answer = model(inputs)

    # Extracting the tokens corresponding to the input sequence
    input_ids = inputs["input_ids"].numpy()[0]

    answer_start_scores = answer.start_logits
    answer_end_scores = answer.end_logits

    # Finding the start and end indices with the highest score
    answer_start = tf.argmax(answer_start_scores, axis=1).numpy()[0]
    answer_end = tf.argmax(answer_end_scores, axis=1).numpy()[0] + 1

    print(f"Answer start index: {answer_start}")
    print(f"Answer end index: {answer_end}")

    # Converting the ID back to a string
    final_answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

    # Viewing the answer
    print(f"Answer to the question: {final_answer}\n")

Question: Where was I born?
Context: I was born in Detroit, USA, in 1980.
Answer start index: 12
Answer end index: 15
Answer to the question:  Detroit, USA

Question: What day is it?
Context: Today is Monday.
Answer start index: 10
Answer end index: 11
Answer to the question:  Monday



In [20]:
# Tokenizing and padding input sequences
inputs = tokenizer(questions, contexts, 
                   add_special_tokens=True, return_tensors="tf", 
                   padding=True)

In [21]:
# Passing the inputs to our model
answers = model(inputs)

# Extracting the tokens corresponding to the input sequences
input_ids = inputs["input_ids"].numpy()

In [22]:
# Viewing our input IDs
print(input_ids)

[[    0 13841    21    38  2421   116     2     2   100    21  2421    11
   2921     6  2805     6    11  5114     4     2]
 [    0  2264   183    16    24   116     2     2  5625    16   302     4
      2     1     1     1     1     1     1     1]]


In [23]:
# Viewing one of the input ID arrays
print(tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[1])))

<s>What day is it?</s></s>Today is Monday.</s><pad><pad><pad><pad><pad><pad><pad>


In [24]:
# Viewing the model output
print(answers)

TFQuestionAnsweringModelOutput(loss=None, start_logits=<tf.Tensor: shape=(2, 20), dtype=float32, numpy=
array([[ 0.78155375, -8.073274  , -8.34409   , -8.468487  , -8.099401  ,
        -8.646169  , -8.6963625 , -7.6574035 , -0.64569664, -3.0957458 ,
        -2.2456067 ,  0.14127505,  6.086786  , -3.696553  , -0.37048122,
        -5.0749235 , -2.5687819 ,  0.3333048 , -3.133765  , -8.028114  ],
       [ 1.7058821 , -8.217684  , -8.914576  , -8.764219  , -8.871216  ,
        -9.367665  , -9.4340925 , -8.613971  , -0.7869898 , -5.4473844 ,
         5.1352215 , -4.5842705 , -8.896799  , -9.427734  , -9.427734  ,
        -9.427734  , -9.427734  , -9.427734  , -9.427734  , -9.427734  ]],
      dtype=float32)>, end_logits=<tf.Tensor: shape=(2, 20), dtype=float32, numpy=
array([[ 1.0022359 , -7.5837913 , -7.652511  , -7.7337627 , -6.577329  ,
        -4.721192  , -6.1436515 , -4.644155  , -5.339603  , -6.4877644 ,
        -3.6098654 , -3.8038702 ,  5.0979486 , -1.4663633 ,  5.515522  ,
       

In [25]:
# Extracting the start and end logits from the output
answer_start_scores = answers.start_logits
answer_end_scores = answers.end_logits

# Finding the start and end indices with the highest score
answer_starts = tf.argmax(answer_start_scores, axis=1).numpy()
answer_ends = tf.argmax(answer_end_scores, axis=1).numpy() + 1

In [26]:
# Viewing answer start and end indices
print(f"Answer start indices: {answer_starts}")
print(f"Answer end indices: {answer_ends}")

Answer start indices: [12 10]
Answer end indices: [15 11]


In [27]:
# Converting the IDs back to string form
final_answers = []

for i, starts_ends in enumerate(zip(answer_starts, answer_ends)):
    final_answers.append(tokenizer.
                         convert_tokens_to_string(
                             tokenizer.convert_ids_to_tokens(
                                 input_ids[i, starts_ends[0]:starts_ends[1]])))

In [28]:
# Viewing our answers
print(final_answers)

[' Detroit, USA', ' Monday']


In [29]:
# Viewing our questions and contexts with their respective answers
for i, answer in enumerate(final_answers):
    print(f"Question {i + 1}: {questions[i]}")
    print(f"Context {i + 1}: {contexts[i]}")
    print(f"Answer start index: {answer_starts[i]}")
    print(f"Answer end index: {answer_ends[i]}")
    print(f"Answer {i + 1}: {answer}\n")

Question 1: Where was I born?
Context 1: I was born in Detroit, USA, in 1980.
Answer start index: 12
Answer end index: 15
Answer 1:  Detroit, USA

Question 2: What day is it?
Context 2: Today is Monday.
Answer start index: 10
Answer end index: 11
Answer 2:  Monday

