# Demo Notebook to trace Sentence Transformers model

#### [Download notebook](https://github.com/opensearch-project/opensearch-py-ml/blob/main/docs/source/examples/demo_tracing_model_torchscript_onnx.ipynb)

This notebook provides a walkthrough guidance for users to trace models from Sentence Transformers in torchScript and onnx format. After tracing the model, customers can register the model to opensearch and generate embeddings.

Remember, tracing model in torchScript or Onnx format at just two different options. We don't need to trace model in both ways. Here in our notebook we just want to show both ways. 

Step 0: Import packages and set up client

Step 1: Save model in torchScript format

Step 2: Register the saved torchScript model in Opensearch

[The following steps are optional, just showing registering model in both ways and comparing the both embedding output]

Step 3: Save model in Onnx format 

Step 4: Register the saved Onnx model in Opensearch

Step 5: Generate Sentence Embedding with registered models




## Step 0: Import packages and set up client
Install required packages for opensearch_py_ml.sentence_transformer_model
Install `opensearchpy` and `opensearch-py-ml` through pypi


In [1]:
#!pip install opensearch-py opensearch-py-ml

import os
import sys
sys.path.append(os.path.abspath(os.path.join('../../..')))

In [2]:
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings("ignore", message="Unverified HTTPS request")
warnings.filterwarnings("ignore", message="TracerWarning: torch.tensor")
warnings.filterwarnings("ignore", message="using SSL with verify_certs=False is insecure.")

import opensearch_py_ml as oml
from opensearchpy import OpenSearch
from opensearch_py_ml.ml_models import SentenceTransformerModel
# import mlcommon to later register the model to OpenSearch Cluster
from opensearch_py_ml.ml_commons import MLCommonClient

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
CLUSTER_URL = 'https://localhost:9200'

In [4]:
def get_os_client(cluster_url = CLUSTER_URL,
                  username='admin',
                  password='admin'):
    '''
    Get OpenSearch client
    :param cluster_url: cluster URL like https://ml-te-netwo-1s12ba42br23v-ff1736fa7db98ff2.elb.us-west-2.amazonaws.com:443
    :return: OpenSearch client
    '''
    client = OpenSearch(
        hosts=[cluster_url],
        http_auth=(username, password),
        verify_certs=False
    )
    return client 

In [5]:
client = get_os_client()

# Connect to ml_common client with OpenSearch client
ml_client = MLCommonClient(client)



## Step 1: Save model in torchScript format

`Opensearch-py-ml` plugin provides method `save_as_pt` which will trace a model in torchScript format and save the model in a zip file in your filesystem. 

Detailed documentation: https://opensearch-project.github.io/opensearch-py-ml/reference/api/sentence_transformer.save_as_pt.html#opensearch_py_ml.ml_models.SentenceTransformerModel.save_as_pt


Users need to provide a model id from sentence transformers (an example: `sentence-transformers/msmarco-distilbert-base-tas-b`). This model id is a huggingface model id. Exaample: https://huggingface.co/sentence-transformers/msmarco-distilbert-base-tas-b

`save_as_pt` will download the model in filesystem and then trace the model with the given input strings.

To get more direction about dummy input string please check this url: https://huggingface.co/docs/transformers/torchscript#dummy-inputs-and-standard-lengths

after tracing the model (a .pt file will be generated), `save_as_pt` method zips `tokenizers.json` and torchScript (`.pt`) file and saves in the file system. 

User can register that model to opensearch to generate embedding.

In [6]:
model_id="sentence-transformers/distiluse-base-multilingual-cased-v1"
pre_trained_model = SentenceTransformerModel(model_id=model_id, folder_path='sentence-transformers-torchscript/distiluse-base-multilingual-cased-v1', overwrite=True)
model_path = pre_trained_model.save_as_pt(model_id=model_id, sentences=["for example providing a small sentence", "we can add multiple sentences"])

  mask, torch.tensor(torch.finfo(scores.dtype).min)


model file is saved to  sentence-transformers-torchscript/distiluse-base-multilingual-cased-v1/distiluse-base-multilingual-cased-v1.pt
zip file is saved to  sentence-transformers-torchscript/distiluse-base-multilingual-cased-v1/distiluse-base-multilingual-cased-v1.zip 



In [7]:
pre_trained_model.make_model_config_json(model_format="TORCH_SCRIPT")

ml-commons_model_config.json file is saved at :  sentence-transformers-torchscript/distiluse-base-multilingual-cased-v1/ml-commons_model_config.json


'sentence-transformers-torchscript/distiluse-base-multilingual-cased-v1/ml-commons_model_config.json'

## Step 2: Register the saved torchScript model in Opensearch

In the last step we saved a sentence transformer model in torchScript format. Now we will register that model in opensearch cluster. To do that we can take help of `register_model` method in `opensearch-py-ml` plugin.

To register model, we need the zip file we just saved in the last step and a model config file. Example of Model config file content can be:

{
  "name": "sentence-transformers/msmarco-distilbert-base-tas-b",
  "version": "1.0.0",
  "description": "This is a port of the DistilBert TAS-B Model to sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and is optimized for the task of semantic search.",
  "model_format": "TORCH_SCRIPT",
  "model_config": {
    "model_type": "distilbert",
    "embedding_dimension": 768,
    "framework_type": "sentence_transformers"
  }
}


`model_format` needs to be `TORCH_SCRIPT` so that internal system will look for the corresponding `.pt` file from the zip folder. 

Please refer to this doc: https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md


Documentation for the method: https://opensearch-project.github.io/opensearch-py-ml/reference/api/ml_commons_register_api.html#opensearch_py_ml.ml_commons.MLCommonClient.register_model

Related demo notebook about ml-commons plugin integration: https://opensearch-project.github.io/opensearch-py-ml/examples/demo_ml_commons_integration.html



In [8]:
model_config_path_torch = 'sentence-transformers-torchscript/distiluse-base-multilingual-cased-v1/ml-commons_model_config.json'
ml_client.register_model(model_path, model_config_path_torch, isVerbose=True)

Total number of chunks 55
Sha1 value of the model file:  2d97ca61bda17fdb20d10da1d59ccf730d91a721d1baabe6b6c5dd033da57778
Model meta data was created successfully. Model Id:  5dgH44kB2Ly7dmqc3rWn
uploading chunk 1 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 2 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 3 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 4 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 5 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 6 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 7 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 8 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 9 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 10 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 11 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 12 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 13 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 14 of 55
Model id: {'status': 'Uploaded'}
u

'5dgH44kB2Ly7dmqc3rWn'

## Step 3: Save model in Onnx format

`Opensearch-py-ml` plugin provides method `save_as_onnx` which will trace a model in ONNX format and save the model in a zip file in your filesystem. 

Detailed documentation: https://opensearch-project.github.io/opensearch-py-ml/reference/api/sentence_transformer.save_as_onnx.html#opensearch_py_ml.ml_models.SentenceTransformerModel.save_as_onnx


Users need to provide a model id from sentence transformers (an example: `sentence-transformers/msmarco-distilbert-base-tas-b`). `save_as_onnx` will download the model in filesystem and then trace the model.

after tracing the model (a .onnx file will be generated), `save_as_onnx` method zips `tokenizers.json` and torchScript (`.onnx`) file and saves in the file system. 

User can register that model to opensearch to generate embedding.


In [9]:
pre_trained_model = SentenceTransformerModel(model_id=model_id, folder_path='sentence-transformers-onxx/distiluse-base-multilingual-cased-v1', overwrite=True)
model_path_onnx = pre_trained_model.save_as_onnx(model_id=model_id)

ONNX opset version set to: 15
Loading pipeline (model: sentence-transformers/distiluse-base-multilingual-cased-v1, tokenizer: sentence-transformers/distiluse-base-multilingual-cased-v1)


Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████████████████████| 556/556 [00:00<00:00, 326kB/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████| 539M/539M [00:02<00:00, 261MB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████████████████████████████████| 452/452 [00:00<00:00, 258kB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████████████████████████████████████████████████████████████████████| 996k/996k [00:00<00:00, 42.3MB/s]
Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████| 1.96M/1.96M [00:00<00:00, 26.1MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████| 112/112 [00:00<00:00, 66.5kB/s]


Creating folder sentence-transformers-onxx/distiluse-base-multilingual-cased-v1/onnx
Using framework PyTorch: 1.13.1+cu117
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
Ensuring inputs are in correct order
head_mask is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']
zip file is saved to  sentence-transformers-onxx/distiluse-base-multilingual-cased-v1/distiluse-base-multilingual-cased-v1.zip 



In [10]:
pre_trained_model.make_model_config_json(model_format="ONNX")

ml-commons_model_config.json file is saved at :  sentence-transformers-onxx/distiluse-base-multilingual-cased-v1/ml-commons_model_config.json


'sentence-transformers-onxx/distiluse-base-multilingual-cased-v1/ml-commons_model_config.json'

## Step 4: Register the saved Onnx model in Opensearch

In the last step we saved a sentence transformer model in ONNX format. Now we will register that model in opensearch cluster. To do that we can take help of `register_model` method in `opensearch-py-ml` plugin.

To register model, we need the zip file we just saved in the last step and a model config file. Example of Model config file content can be:

{
  "name": "sentence-transformers/msmarco-distilbert-base-tas-b",
  "version": "1.0.0",
  "description": "This is a port of the DistilBert TAS-B Model to sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and is optimized for the task of semantic search.",
  "model_format": "ONNX",
  "model_config": {
    "model_type": "distilbert",
    "embedding_dimension": 768,
    "framework_type": "sentence_transformers",
    "pooling_mode":"cls",
    "normalize_result":"false"
  }
}

`model_format` needs to be `ONNX` so that internal system will look for the corresponding `.onnx` file from the zip folder.

Please refer to this doc: https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md


Documentation for the method: https://opensearch-project.github.io/opensearch-py-ml/reference/api/ml_commons_register_api.html#opensearch_py_ml.ml_commons.MLCommonClient.register_model

Related demo notebook about ml-commons plugin integration: https://opensearch-project.github.io/opensearch-py-ml/examples/demo_ml_commons_integration.html

In [11]:
model_config_path_onnx = 'sentence-transformers-onxx/distiluse-base-multilingual-cased-v1/ml-commons_model_config.json'
ml_client.register_model(model_path_onnx, model_config_path_onnx, isVerbose=True)

Total number of chunks 55
Sha1 value of the model file:  9514c1e77338317f91d4a5ee38fbc5ed9dd4dfea612ed71cfd3a4beb6e809199
Model meta data was created successfully. Model Id:  ua_94IkBjE9Uflh5fQlz
uploading chunk 1 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 2 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 3 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 4 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 5 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 6 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 7 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 8 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 9 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 10 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 11 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 12 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 13 of 55
Model id: {'status': 'Uploaded'}
uploading chunk 14 of 55
Model id: {'status': 'Uploaded'}
u

'ua_94IkBjE9Uflh5fQlz'

## Step 5: Generate Sentence Embedding

Now after loading these models in memory, we can generate embedding for sentences. We can provide a list of sentences to get a list of embedding for the sentences. 

In [12]:
# Now using this model we can generate sentence embedding.

import numpy as np

input_sentences = ["first sentence", "second sentence"]

# Generated embedding from torchScript

embedding_output_torch = ml_client.generate_embedding("t6_84IkBjE9Uflh5bwk4", input_sentences)

#just taking embedding for the first sentence
data_torch = embedding_output_torch["inference_results"][0]["output"][0]["data"]

# Generated embedding from onnx

embedding_output_onnx = ml_client.generate_embedding("ua_94IkBjE9Uflh5fQlz", input_sentences)

# Just taking embedding for the first sentence
data_onnx = embedding_output_onnx["inference_results"][0]["output"][0]["data"]

# Now we can check if there's any significant difference between two outputs

print(np.testing.assert_allclose(data_torch, data_onnx, rtol=1e-03, atol=1e-05))

AssertionError: 
Not equal to tolerance rtol=0.001, atol=1e-05

(shapes (512,), (768,) mismatch)
 x: array([ 1.097566e-02,  6.483249e-02, -4.571174e-02,  9.350105e-02,
       -2.485733e-02, -3.051355e-02,  8.830577e-03,  1.258768e-02,
        8.662864e-03, -4.904142e-02,  5.009840e-04, -6.247667e-03,...
 y: array([-8.156287e-02,  3.827157e-02,  1.071574e-01, -1.202194e-03,
       -5.243819e-02, -4.310886e-02, -1.101768e-01, -1.975697e-03,
       -5.380076e-03,  9.543222e-02, -2.704400e-02, -6.535825e-02,...

In [25]:
len(data_torch)

512

In [26]:
len(data_onnx)

768

In [17]:
from sentence_transformers import SentenceTransformer
pre_trained_model = SentenceTransformer(model_id)
original_embedding_data = list(
    pre_trained_model.encode(input_sentences, convert_to_numpy=True)
)

In [20]:
for i in range(len(input_sentences)):
    print(i)
    print(np.testing.assert_allclose(original_embedding_data[i], data_torch[i], rtol=1e-03, atol=1e-05))

0


AssertionError: 
Not equal to tolerance rtol=0.001, atol=1e-05

Mismatched elements: 511 / 512 (99.8%)
Max absolute difference: 0.15153284
Max relative difference: 13.80625753
 x: array([ 1.097566e-02,  6.483249e-02, -4.571173e-02,  9.350105e-02,
       -2.485733e-02, -3.051355e-02,  8.830579e-03,  1.258768e-02,
        8.662860e-03, -4.904142e-02,  5.009826e-04, -6.247666e-03,...
 y: array(0.010976)

In [22]:
for i in range(len(input_sentences)):
    print(i)
    print(np.testing.assert_allclose(original_embedding_data[i], data_onnx[i], rtol=1e-03, atol=1e-05))

0


AssertionError: 
Not equal to tolerance rtol=0.001, atol=1e-05

Mismatched elements: 512 / 512 (100%)
Max absolute difference: 0.23379132
Max relative difference: 2.866394
 x: array([ 1.097566e-02,  6.483249e-02, -4.571173e-02,  9.350105e-02,
       -2.485733e-02, -3.051355e-02,  8.830579e-03,  1.258768e-02,
        8.662860e-03, -4.904142e-02,  5.009826e-04, -6.247666e-03,...
 y: array(-0.081563)

In [27]:
ml_client.get_model_info("t6_84IkBjE9Uflh5bwk4")

{'name': 'sentence-transformers/distiluse-base-multilingual-cased-v1',
 'algorithm': 'TEXT_EMBEDDING',
 'model_version': '1',
 'model_format': 'TORCH_SCRIPT',
 'model_state': 'DEPLOYED',
 'model_content_size_in_bytes': 544326391,
 'model_content_hash_value': 'cc154bc7c01f2a4b97d33eb452376bee7e28f48d3385348f22ec214385370965',
 'model_config': {'model_type': 'distilbert',
  'embedding_dimension': 512,
  'framework_type': 'SENTENCE_TRANSFORMERS',
  'all_config': '{"_name_or_path": "/home/latchari/.cache/torch/sentence_transformers/sentence-transformers_distiluse-base-multilingual-cased-v1/", "activation": "gelu", "architectures": ["DistilBertModel"], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "initializer_range": 0.02, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tie_weights_": true, "torch_dtype": "float32", "transfo

In [28]:
ml_client.get_model_info("ua_94IkBjE9Uflh5fQlz")

{'name': 'sentence-transformers/distiluse-base-multilingual-cased-v1',
 'algorithm': 'TEXT_EMBEDDING',
 'model_version': '1',
 'model_format': 'ONNX',
 'model_state': 'DEPLOYED',
 'model_content_size_in_bytes': 542639825,
 'model_content_hash_value': '9514c1e77338317f91d4a5ee38fbc5ed9dd4dfea612ed71cfd3a4beb6e809199',
 'model_config': {'model_type': 'distilbert',
  'embedding_dimension': 512,
  'framework_type': 'SENTENCE_TRANSFORMERS',
  'all_config': '{"_name_or_path": "/home/latchari/.cache/torch/sentence_transformers/sentence-transformers_distiluse-base-multilingual-cased-v1/", "activation": "gelu", "architectures": ["DistilBertModel"], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "initializer_range": 0.02, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tie_weights_": true, "torch_dtype": "float32", "transformers_ve

# Note: Torch Script Works!!!

In [10]:
input_sentences = ["first sentence", "second sentence", "very very long dksfml smflskdm"]

In [11]:
embedding_output_torch = ml_client.generate_embedding("5dgH44kB2Ly7dmqc3rWn", input_sentences)

In [12]:
embedding_data_torch = [
            embedding_output_torch["inference_results"][i]["output"][0]["data"]
            for i in range(len(input_sentences))
        ]

In [13]:
from sentence_transformers import SentenceTransformer
pre_trained_model = SentenceTransformer(model_id)
original_embedding_data = list(
    pre_trained_model.encode(input_sentences, convert_to_numpy=True)
)

In [15]:
import numpy as np
for i in range(len(input_sentences)):
    print(i)
    print(np.testing.assert_allclose(original_embedding_data[i], embedding_data_torch[i], rtol=1e-03, atol=1e-05))

0
None
1
None
2
None


In [21]:
folder_path='sentence-transformers-onxx/distiluse-base-multilingual-cased-v1'

In [20]:
pre_trained_model = SentenceTransformerModel(model_id=model_id, folder_path='sentence-transformers-onxx/distiluse-base-multilingual-cased-v1', overwrite=True)

In [119]:
from transformers.convert_graph_to_onnx import convert
from pathlib import Path

model = SentenceTransformer(model_id)


model_name = str(model_id.split("/")[-1] + ".onnx")

model_path = os.path.join(folder_path, "onnx", model_name)
        
convert(
    framework="pt",
    model=model_id,
    output=Path(model_path),
    opset=15,
)

ONNX opset version set to: 15
Loading pipeline (model: sentence-transformers/clip-ViT-B-32-multilingual-v1, tokenizer: sentence-transformers/clip-ViT-B-32-multilingual-v1)
Creating folder sentence-transformers-onxx/distiluse-base-multilingual-cased-v1/onnx
Using framework PyTorch: 1.13.1+cu117
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
Ensuring inputs are in correct order
head_mask is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']


In [120]:
import onnx
onnx_model = onnx.load(model_path)
# Check that the model is well formed
onnx.checker.check_model(onnx_model)

# Print a human readable representation of the graph
print(onnx.helper.printable_graph(onnx_model.graph))

graph torch_jit (
  %input_ids[INT64, batchxsequence]
  %attention_mask[INT64, batchxsequence]
) initializers (
  %embeddings.word_embeddings.weight[FLOAT, 119547x768]
  %embeddings.position_embeddings.weight[FLOAT, 512x768]
  %embeddings.LayerNorm.weight[FLOAT, 768]
  %embeddings.LayerNorm.bias[FLOAT, 768]
  %transformer.layer.0.attention.q_lin.bias[FLOAT, 768]
  %transformer.layer.0.attention.k_lin.bias[FLOAT, 768]
  %transformer.layer.0.attention.v_lin.bias[FLOAT, 768]
  %transformer.layer.0.attention.out_lin.bias[FLOAT, 768]
  %transformer.layer.0.sa_layer_norm.weight[FLOAT, 768]
  %transformer.layer.0.sa_layer_norm.bias[FLOAT, 768]
  %transformer.layer.0.ffn.lin1.bias[FLOAT, 3072]
  %transformer.layer.0.ffn.lin2.bias[FLOAT, 768]
  %transformer.layer.0.output_layer_norm.weight[FLOAT, 768]
  %transformer.layer.0.output_layer_norm.bias[FLOAT, 768]
  %transformer.layer.1.attention.q_lin.bias[FLOAT, 768]
  %transformer.layer.1.attention.k_lin.bias[FLOAT, 768]
  %transformer.layer.1.att

In [121]:
import onnxruntime as ort

ort_session = ort.InferenceSession(model_path)

In [122]:
from transformers import AutoTokenizer

In [123]:
autotokenizer = AutoTokenizer.from_pretrained(model_id)

In [124]:
auto_features = autotokenizer(
            input_sentences, return_tensors="pt", padding=True, truncation=True
        )

In [125]:
auto_features 

{'input_ids': tensor([[  101, 10422, 49219,   102,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0],
        [  101, 11132, 49219,   102,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0],
        [  101, 12558, 12558, 11695, 53459, 10107, 10575, 63308, 39709, 10575,
         55333, 10162, 10147,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [126]:
auto_features['input_ids'][0].numpy()

array([  101, 10422, 49219,   102,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0])

In [128]:
ort_session.get_inputs()[1].name

'attention_mask'

In [129]:
def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

# compute ONNX Runtime output prediction
ort_inputs = {
    ort_session.get_inputs()[0].name: to_numpy(auto_features['input_ids']),
    ort_session.get_inputs()[1].name: to_numpy(auto_features['attention_mask']),        
             }
ort_outs = ort_session.run(None, ort_inputs)

In [130]:
print(ort_outs)

[array([[[ 0.61120903, -0.40342048,  0.48542613, ..., -0.16863379,
          0.5234538 , -0.08194818],
        [ 0.4675004 , -0.3925998 ,  0.3993502 , ..., -0.23300043,
          0.41198125, -0.1259718 ],
        [ 0.5767249 , -0.4247927 ,  0.44609743, ..., -0.15110734,
          0.48633742, -0.07575865],
        ...,
        [ 0.5615095 , -0.2975685 ,  0.3169654 , ..., -0.08160089,
          0.53997755,  0.04850146],
        [ 0.5764136 , -0.30873904,  0.31131336, ..., -0.09090362,
          0.5504114 ,  0.02720033],
        [ 0.5729804 , -0.28210837,  0.28117096, ..., -0.07087893,
          0.5339788 ,  0.04254216]],

       [[ 0.57384884, -0.37592244,  0.55286735, ..., -0.11817868,
          0.5258734 , -0.05727132],
        [ 0.34844044, -0.33862054,  0.48056483, ..., -0.14881909,
          0.43648356, -0.09156983],
        [ 0.5895266 , -0.37401038,  0.5069119 , ..., -0.12960038,
          0.50383234, -0.0276256 ],
        ...,
        [ 0.50329834, -0.26046965,  0.37284666, ..., 

In [131]:
len(ort_outs[0])

3

In [133]:
ort_outs[0][0].shape

(14, 768)

In [134]:
embedding_data_onnx = [
            ort_outs[0][i]
            for i in range(len(input_sentences))
        ]

In [135]:
import numpy as np
for i in range(len(input_sentences)):
    print(i)
    print(np.testing.assert_allclose(original_embedding_data[i], embedding_data_onnx[i], rtol=1e-03, atol=1e-05))

0


AssertionError: 
Not equal to tolerance rtol=0.001, atol=1e-05

(shapes (512,), (14, 768) mismatch)
 x: array([ 1.097567e-02,  6.483248e-02, -4.571173e-02,  9.350104e-02,
       -2.485733e-02, -3.051357e-02,  8.830560e-03,  1.258769e-02,
        8.662871e-03, -4.904142e-02,  5.009779e-04, -6.247674e-03,...
 y: array([[ 0.611209, -0.40342 ,  0.485426, ..., -0.168634,  0.523454,
        -0.081948],
       [ 0.4675  , -0.3926  ,  0.39935 , ..., -0.233   ,  0.411981,...

In [67]:
onnx_model.graph.output

[name: "output_0"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "batch"
      }
      dim {
        dim_param: "sequence"
      }
      dim {
        dim_value: 768
      }
    }
  }
}
]

In [68]:
model_path_2 = os.path.join(folder_path, "onnx2", model_name)
convert(
    framework="pt",
    model=model_id,
    output=Path(model_path_2),
    opset=11,
)

ONNX opset version set to: 11
Loading pipeline (model: sentence-transformers/distiluse-base-multilingual-cased-v1, tokenizer: sentence-transformers/distiluse-base-multilingual-cased-v1)
Creating folder sentence-transformers-onxx/distiluse-base-multilingual-cased-v1/onnx2
Using framework PyTorch: 1.13.1+cu117
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
Ensuring inputs are in correct order
head_mask is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']


In [69]:
onnx_model = onnx.load(model_path_2)

In [70]:
onnx_model.graph.output

[name: "output_0"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "batch"
      }
      dim {
        dim_param: "sequence"
      }
      dim {
        dim_value: 768
      }
    }
  }
}
]

In [71]:
help(onnx_model)

Help on ModelProto in module onnx.onnx_ml_pb2 object:

class ModelProto(google._upb._message.Message, google.protobuf.message.Message)
 |  A ProtocolMessage
 |  
 |  Method resolution order:
 |      ModelProto
 |      google._upb._message.Message
 |      google.protobuf.message.Message
 |      builtins.object
 |  
 |  Data and other attributes defined here:
 |  
 |  DESCRIPTOR = <google._upb._message.Descriptor object>
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from google._upb._message.Message:
 |  
 |  ByteSize(...)
 |      Returns the size of the message in bytes.
 |  
 |  Clear(...)
 |      Clears the message.
 |  
 |  ClearExtension(...)
 |      Clears a message field.
 |  
 |  ClearField(...)
 |      Clears a message field.
 |  
 |  CopyFrom(...)
 |      Copies a protocol message into the current message.
 |  
 |  DiscardUnknownFields(...)
 |      Discards the unknown fields.
 |  
 |  FindInitializationErrors(...)
 |     

In [72]:
from transformers.convert_graph_to_onnx import load_graph_from_args

In [73]:
nlp = load_graph_from_args("feature-extraction", "pt", model_id, None)

Loading pipeline (model: sentence-transformers/distiluse-base-multilingual-cased-v1, tokenizer: sentence-transformers/distiluse-base-multilingual-cased-v1)


In [83]:
help(nlp.model)

Help on DistilBertModel in module transformers.models.distilbert.modeling_distilbert object:

class DistilBertModel(DistilBertPreTrainedModel)
 |  DistilBertModel(config: transformers.configuration_utils.PretrainedConfig)
 |  
 |  The bare DistilBERT encoder/transformer outputting raw hidden-states without any specific head on top.
 |  
 |  This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
 |  library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
 |  etc.)
 |  
 |  This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
 |  Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
 |  and behavior.
 |  
 |  Parameters:
 |      config ([`DistilBertConfig`]): Model configuration class with all the parameters of the model.
 |          Initializing with a confi

In [97]:
nlp.model.modules

<bound method Module.modules of DistilBertModel(
  (embeddings): Embeddings(
    (word_embeddings): Embedding(119547, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer): Transformer(
    (layer): ModuleList(
      (0): TransformerBlock(
        (attention): MultiHeadSelfAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (q_lin): Linear(in_features=768, out_features=768, bias=True)
          (k_lin): Linear(in_features=768, out_features=768, bias=True)
          (v_lin): Linear(in_features=768, out_features=768, bias=True)
          (out_lin): Linear(in_features=768, out_features=768, bias=True)
        )
        (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (ffn): FFN(
          (dropout): Dropout(p=0.1, inplace=False)
          (lin1): Linear(in_features=768, out_features=3072, bias

In [98]:
nlp.model.get_position_embeddings()

Embedding(512, 768)

In [102]:
nlp.model.get_submodule('Pooling')

AttributeError: DistilBertModel has no attribute `Pooling`

In [106]:
model_id = "sentence-transformers/multi-qa-mpnet-base-dot-v1"

model_name = str(model_id.split("/")[-1] + ".onnx")

model_path_3 = os.path.join(folder_path, "onnx3", model_name)
convert(
    framework="pt",
    model=model_id,
    output=Path(model_path_3),
    opset=11,
)

ONNX opset version set to: 11
Loading pipeline (model: sentence-transformers/multi-qa-mpnet-base-dot-v1, tokenizer: sentence-transformers/multi-qa-mpnet-base-dot-v1)
Creating folder sentence-transformers-onxx/distiluse-base-multilingual-cased-v1/onnx3
Using framework PyTorch: 1.13.1+cu117
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
Found output output_1 with shape: {0: 'batch'}
Ensuring inputs are in correct order
position_ids is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']


In [107]:
onnx_model = onnx.load(model_path_3)

In [108]:
onnx_model.graph.output

[name: "output_0"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "batch"
      }
      dim {
        dim_param: "sequence"
      }
      dim {
        dim_value: 768
      }
    }
  }
}
, name: "output_1"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "batch"
      }
      dim {
        dim_value: 768
      }
    }
  }
}
]

In [109]:
nlp = load_graph_from_args("feature-extraction", "pt", model_id, None)

Loading pipeline (model: sentence-transformers/multi-qa-mpnet-base-dot-v1, tokenizer: sentence-transformers/multi-qa-mpnet-base-dot-v1)


In [110]:
nlp.model.modules

<bound method Module.modules of MPNetModel(
  (embeddings): MPNetEmbeddings(
    (word_embeddings): Embedding(30527, 768, padding_idx=1)
    (position_embeddings): Embedding(514, 768, padding_idx=1)
    (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): MPNetEncoder(
    (layer): ModuleList(
      (0): MPNetLayer(
        (attention): MPNetAttention(
          (attn): MPNetSelfAttention(
            (q): Linear(in_features=768, out_features=768, bias=True)
            (k): Linear(in_features=768, out_features=768, bias=True)
            (v): Linear(in_features=768, out_features=768, bias=True)
            (o): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (intermediate): MPNetIntermediate(
     

In [111]:
model_id = "sentence-transformers/clip-ViT-B-32-multilingual-v1"

model_name = str(model_id.split("/")[-1] + ".onnx")

model_path_4 = os.path.join(folder_path, "onnx4", model_name)
convert(
    framework="pt",
    model=model_id,
    output=Path(model_path_4),
    opset=11,
)

ONNX opset version set to: 11
Loading pipeline (model: sentence-transformers/clip-ViT-B-32-multilingual-v1, tokenizer: sentence-transformers/clip-ViT-B-32-multilingual-v1)


Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 572/572 [00:00<00:00, 252kB/s]
Downloading pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 539M/539M [00:02<00:00, 264MB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 371/371 [00:00<00:00, 213kB/s]
Downloading (…)solve/main/vocab.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 996k/996k [00:00<00:00, 7.77MB/s]
Downloading (…)/main/tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.96M/1.96M [00:00<00:00, 6.57MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████

Creating folder sentence-transformers-onxx/distiluse-base-multilingual-cased-v1/onnx4
Using framework PyTorch: 1.13.1+cu117
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
Ensuring inputs are in correct order
head_mask is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']


In [112]:
onnx_model = onnx.load(model_path_4)

In [117]:
help(onnx_model)

Help on ModelProto in module onnx.onnx_ml_pb2 object:

class ModelProto(google._upb._message.Message, google.protobuf.message.Message)
 |  A ProtocolMessage
 |  
 |  Method resolution order:
 |      ModelProto
 |      google._upb._message.Message
 |      google.protobuf.message.Message
 |      builtins.object
 |  
 |  Data and other attributes defined here:
 |  
 |  DESCRIPTOR = <google._upb._message.Descriptor object>
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from google._upb._message.Message:
 |  
 |  ByteSize(...)
 |      Returns the size of the message in bytes.
 |  
 |  Clear(...)
 |      Clears the message.
 |  
 |  ClearExtension(...)
 |      Clears a message field.
 |  
 |  ClearField(...)
 |      Clears a message field.
 |  
 |  CopyFrom(...)
 |      Copies a protocol message into the current message.
 |  
 |  DiscardUnknownFields(...)
 |      Discards the unknown fields.
 |  
 |  FindInitializationErrors(...)
 |     

In [113]:
onnx_model.graph.output

[name: "output_0"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "batch"
      }
      dim {
        dim_param: "sequence"
      }
      dim {
        dim_value: 768
      }
    }
  }
}
]

In [114]:
nlp = load_graph_from_args("feature-extraction", "pt", model_id, None)

Loading pipeline (model: sentence-transformers/clip-ViT-B-32-multilingual-v1, tokenizer: sentence-transformers/clip-ViT-B-32-multilingual-v1)


In [115]:
nlp.model.modules

<bound method Module.modules of DistilBertModel(
  (embeddings): Embeddings(
    (word_embeddings): Embedding(119547, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer): Transformer(
    (layer): ModuleList(
      (0): TransformerBlock(
        (attention): MultiHeadSelfAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (q_lin): Linear(in_features=768, out_features=768, bias=True)
          (k_lin): Linear(in_features=768, out_features=768, bias=True)
          (v_lin): Linear(in_features=768, out_features=768, bias=True)
          (out_lin): Linear(in_features=768, out_features=768, bias=True)
        )
        (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (ffn): FFN(
          (dropout): Dropout(p=0.1, inplace=False)
          (lin1): Linear(in_features=768, out_features=3072, bias

In [None]:
# https://huggingface.co/docs/transformers/serialization

In [None]:
# https://github.com/oborchers/sentence-transformers/blob/master/examples/onnx_inference/onnx_inference.ipynb
# https://github.com/UKPLab/sentence-transformers/pull/668