# TensorFlow: Deploy model to Vespa through ONNX

This tutorial will cover the following steps:

1. Download labeled data containing Vespa ranking features.
2. Create a listwise dataset based on a TensorFlow data pipeline.
3. Train a Learning to Rank model (LTR) model using the TensorFlow Ranking framework.
4. Simplify the LTR model to be suitable for ranking in Vespa
5. Convert to TensorFlow model to ONNX file format.
6. Create and deploy a Vespa application that uses the TensorFlow model
7. Feed data to the Vespa application
8. Ensure that prediction from the model deployed to Vespa match those obtained from the model directly.

## Install packages

In [None]:
!pip3 install -Uqq pyvespa learntorank numpy pandas tensorflow tensorflow_ranking onnx tf2onnx

## Get the data

In [1]:
import pandas as pd

Download labeled data containing Vespa ranking features collected from an MS Marco passage ranking application.

In [2]:
df = pd.read_csv("https://data.vespa.oath.cloud/blog/ranking/train_sample.csv")
df = df[
    ["document_id", 
     "query_id", 
     "label", 
     "fieldMatch(body).queryCompleteness",
     "fieldMatch(body).significance",
     "nativeRank",
    ]
]

In [3]:
df.shape

(100000, 6)

For each `query_id`, there is 9 irrelevant `document_id` with `label = 0` and 1 relevant `document_id` with `label = 1`.

In [4]:
df.head(10)

Unnamed: 0,document_id,query_id,label,fieldMatch(body).queryCompleteness,fieldMatch(body).significance,nativeRank
0,27061,3,0,0.625,0.566311,0.042421
1,257,3,0,0.625,0.58257,0.039192
2,363,3,0,0.5,0.46603,0.034418
3,22682,3,0,0.625,0.566311,0.061149
4,160,3,0,0.5,0.437808,0.035017
5,228,3,0,0.5,0.437808,0.032697
6,3901893,3,0,0.75,0.748064,0.074917
7,1142680,3,1,0.75,0.748064,0.099112
8,141,3,0,0.5,0.442879,0.038093
9,3060834,3,0,0.75,0.763933,0.075347


## Create a listwise dataset

Define some parameters required to setup the listwise data pipeline.

In [5]:
number_documents_per_query = 10            
feature_names = [                         
    "fieldMatch(body).queryCompleteness", 
    "fieldMatch(body).significance", 
    "nativeRank"
]
number_features = len(feature_names)
batch_size=32

Each feature data point will have the shape equal to `(batch_size, number_documents_per_query, number_features)` and each label data point will have shape equal to `(batch_size, number_documents_per_query)`.

In [None]:
import tensorflow as tf

The code below creates a TensorFlow data pipeline (`tf.data.Dataset`) from our DataFrame and group the rows by the `query_id` variable to form a listwise dataset. We then configure the data pipeline to shuffle and set a batch size.

In [7]:
shuffle_buffer_size = 10000
ds = tf.data.Dataset.from_tensor_slices(
    {
        "features": tf.cast(df[feature_names].values, tf.float32),
        "label": tf.cast(df["label"].values, tf.float32),
        "query_id": tf.cast(df["query_id"].values, tf.int64),
    }
)

key_func = lambda x: x["query_id"]
reduce_func = lambda key, dataset: dataset.batch(
    number_documents_per_query, drop_remainder=True
)
listwise_ds = ds.group_by_window(
    key_func=key_func,
    reduce_func=reduce_func,
    window_size=number_documents_per_query,
)
listwise_ds = listwise_ds.map(lambda x: (x["features"], x["label"]))
listwise_ds = listwise_ds.shuffle(buffer_size=shuffle_buffer_size).batch(
    batch_size=batch_size
)

We can see the shape of the `features` and of the `labels` are as expected.

In [8]:
for d in listwise_ds.take(1):
    print(d[0].shape)
    print(d[1].shape)

(32, 10, 3)
(32, 10)


## Create and compile model

We are going to create a linear model that can take a listwise data as input with shape `(batch_size, number_documents_per_query, number_features)` and output one prediction per document with shape `(batch_size, number_documents_per_query)`

In [9]:
input_layer = tf.keras.layers.Input(shape=(number_documents_per_query, number_features))
dense_layer = tf.keras.layers.Dense(
    1,
    use_bias=False,
    activation=None,
    name="dense"
)
output_layer = tf.keras.layers.Reshape((number_documents_per_query,))

In [10]:
model = tf.keras.Sequential(layers=[input_layer, dense_layer, output_layer])

In this tutorial, we want to optimize the [Normalized Discounted Cumulative Gain](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) at position 10 (NDCG@10). We then select a loss function that is a smooth approximation of the NDCG metric and create a stateless NDCG@10 metric to use when compiling the model defined above.

In [11]:
import tensorflow_ranking as tfr

ndcg = tfr.keras.metrics.NDCGMetric(topn=10)
def ndcg_stateless(y_true, y_pred):
    """
    Create stateless metric so that we can compute the validation metric 
    from scratch at the end of each epoch.
    """
    ndcg.reset_states()
    return ndcg(y_true, y_pred)

optimizer = tf.keras.optimizers.Adagrad(learning_rate=2)
model.compile(
    optimizer=optimizer,
    loss=tfr.keras.losses.ApproxNDCGLoss(),
    metrics=ndcg_stateless,
)

Use the listwise dataset to fit the model:

In [12]:
history = model.fit(listwise_ds, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


## Simplify model input/output for deployment

After training the model by minimizing a listwise loss function, we can simplify the model before deploying it to Vespa. At inference time, Vespa will evaluate each document individually and use a ranking function to rank documents.

Therefore, the input layer will expect a tensor named `input` with shape equal to `(1, number_features)`.

In [13]:
simpler_model = tf.keras.Sequential(
    [tf.keras.layers.Input(shape=(number_features,), batch_size=1, name="input"), 
     dense_layer
    ]
)

We are going to save the `simpler_model` to disk and then use the tf2onnx tool to convert the model to ONNX format.

In [14]:
simpler_model.save("simpler_keras_model")

INFO:tensorflow:Assets written to: simpler_keras_model/assets


INFO:tensorflow:Assets written to: simpler_keras_model/assets


In [15]:
from tf2onnx import convert

!python3 -m tf2onnx.convert --saved-model simpler_keras_model --output simpler_keras_model.onnx

2023-08-08 14:09:40,328 - INFO - Signatures found in model: [serving_default].
2023-08-08 14:09:40,328 - INFO - Output names: ['dense']
Instructions for updating:
This API was designed for TensorFlow v1. See https://www.tensorflow.org/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
Instructions for updating:
This API was designed for TensorFlow v1. See https://www.tensorflow.org/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
2023-08-08 14:09:40,388 - INFO - Using tensorflow=2.13.0, onnx=1.14.0, tf2onnx=1.8.4/cd55bf
2023-08-08 14:09:40,388 - INFO - Using opset <onnx, 9>
2023-08-08 14:09:40,389 - INFO - Computed 0 values for constant folding
2023-08-08 14:09:40,395 - INFO - Optimizing ONNX model
2023-08-08 14:09:40,402 - INFO - After optimization: Identity -5 (5->0)
2023-08-08 14:09:40,403 - INFO - 
2023-08-08 14:09:40,403 - INFO - Successfully converted TensorFlow model simpler_keras_model to ONNX
2023-08-08 14:09:40,403 - INFO 

We can inspect the onnx model input and output. We first load the ONNX model:

In [16]:
import onnx                  

m = onnx.load("simpler_keras_model.onnx")

As mentioned before, the model expects a tensor named `input` with shape `(1, 3)`.

In [17]:
m.graph.input

[name: "input:0"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 3
      }
    }
  }
}
]

The output will be a tensor named `dense` with shape `(1,1)`.

In [18]:
m.graph.output

[name: "dense"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 1
      }
      dim {
        dim_value: 1
      }
    }
  }
}
]

## Define the application package

This section will use the Vespa python API `pyvespa` to create an application package with a ranking function that uses the tensorflow model exported to ONNX. 

The data used to train the model was derived from a Vespa application based on the MS Marco passage dataset. So, we are going to name the application `msmarco`, and start by adding two fields: `id` to hold the document id and `text` to hold the passages from the msmarco dataset.

`indexing` configuration: We add `"summary"` to the `indexing` parameter because we want to include both the `id` and the `text` field in the query results. The `"attribute"` indicates that the field `id` will be stored in-memory. The `"index"` indicates that Vespa will create a search index for the `text` field.

In [19]:
from vespa.package import ApplicationPackage, Field

app_package = ApplicationPackage(name="msmarco")

app_package.schema.add_fields(
    Field(name="id", type="string", indexing=["summary", "attribute"]),
    Field(name="text", type="string", indexing=["summary", "index"])
)

Note that at each step along the application package definition, we can inspect the content of the Vespa search definition file:

In [20]:
print(app_package.schema.schema_to_text)

schema msmarco {
    document msmarco {
        field id type string {
            indexing: summary | attribute
        }
        field text type string {
            indexing: summary | index
        }
    }
}


Add `simpler_keras_model.onnx` to the schema. 
* The `model_name` is an id that can be used in the ranking function to identify which model to use. 
* The `model_file_path` is the current path of the .onnx file. When deploying the application, `pyvespa` will move the file to the correct location inside the Vespa application package folder.
* The `inputs` maps the name of the inputs contained in the ONNX model to the name of the Vespa source that will be used as input to the model. In this case we will create a function called `vespa_input` that output a tensor of type float with the expected shape `(1, 3)`.
* The `outputs` maps the output name in the ONNX file to the output name that will be recognized by Vespa.

In [21]:
from vespa.package import OnnxModel

app_package.schema.add_model(
    OnnxModel(
        model_name="ltr_tensorflow",
        model_file_path="simpler_keras_model.onnx",
        inputs={"input:0": "vespa_input"},
        outputs={"dense": "dense"},
    )
)

It is possible to see the addition of the `onnx-model` section in the search definition below. Note that the model file is expected to be under the `files` folder inside the final application package folder, but `pyvespa` takes care of the model file placement when deploying the application.

In [22]:
print(app_package.schema.schema_to_text)

schema msmarco {
    document msmarco {
        field id type string {
            indexing: summary | attribute
        }
        field text type string {
            indexing: summary | index
        }
    }
    onnx-model ltr_tensorflow {
        file: files/ltr_tensorflow.onnx
        input input:0: vespa_input
        output dense: dense
    }
}


Add a rank profile named `tensorflow` that uses the TensorFlow model to rank documents. 
* `first_phase`: We use the Vespa ranking feature `onnx` to access the ONNX model named `ltr_tensorflow` and use the output `dense`. We apply the `sum` because Vespa requires the relevance score to be a scaler and the output of the ONNX model in this case is a tensor of shape `(1,1)`.
* `vespa_input` function: The ONNX model was trained with the features `fieldMatch(text).queryCompleteness`, `fieldMatch(text).significance` and `nativeRank(text)` and expects and tensor of shape `(1,3)` containing those features.
* `summary_features`: Summary features allow us to specify Vespa features to be included in the output of a query. In this case, we want to access to the model inputs and output to check if the Vespa model evaluation is the same as if we use the original TensorFlow model.

In [23]:
from vespa.package import RankProfile, Function

app_package.schema.add_rank_profile(
    RankProfile(
        name="tensorflow", 
        first_phase="sum(onnx(ltr_tensorflow).dense)", 
        functions=[
            Function(
                name="vespa_input", 
                expression="tensor<float>(x[1],y[3]):[["
                    "fieldMatch(text).queryCompleteness, "
                    "fieldMatch(text).significance, "
                    "nativeRank(text)"
                "]]"
            )
        ],
        summary_features=[
            "onnx(ltr_tensorflow)", 
            "fieldMatch(text).queryCompleteness", 
            "fieldMatch(text).significance", 
            "nativeRank(text)"
        ]
    )
)

The `rank-profile` called tensorflow can be seen below:

In [24]:
print(app_package.schema.schema_to_text)

schema msmarco {
    document msmarco {
        field id type string {
            indexing: summary | attribute
        }
        field text type string {
            indexing: summary | index
        }
    }
    onnx-model ltr_tensorflow {
        file: files/ltr_tensorflow.onnx
        input input:0: vespa_input
        output dense: dense
    }
    rank-profile tensorflow {
        function vespa_input() {
            expression {
                tensor<float>(x[1],y[3]):[[fieldMatch(text).queryCompleteness, fieldMatch(text).significance, nativeRank(text)]]
            }
        }
        first-phase {
            expression {
                sum(onnx(ltr_tensorflow).dense)
            }
        }
        summary-features {
            onnx(ltr_tensorflow)
            fieldMatch(text).queryCompleteness
            fieldMatch(text).significance
            nativeRank(text)
        }
    }
}


Now that we are done with the application package definition. We can deploy the application:

In [25]:
from vespa.deployment import VespaDocker

vespa_docker = VespaDocker()
app = vespa_docker.deploy(application_package=app_package)

Waiting for configuration server, 0/300 seconds...
Waiting for configuration server, 5/300 seconds...
Waiting for application status, 0/300 seconds...
Waiting for application status, 5/300 seconds...
Waiting for application status, 10/300 seconds...
Waiting for application status, 15/300 seconds...
Waiting for application status, 20/300 seconds...
Waiting for application status, 25/300 seconds...
Finished deployment.


## Feed the application

Once the application is running, it is time to feed msmarco passage data to it.

In [26]:
from learntorank.passage import PassageData

dataset = PassageData.load()

We are going to use only 10 documents because our goal here is to show that Vespa returns the correct predictions from the TensorFlow model.

In [27]:
data = dataset.get_corpus().head(10)
data.rename(columns={'doc_id': 'id'}, inplace=True)

In [28]:
data.head()

Unnamed: 0,id,text
0,5954248,Why GameStop is excited for Dragon Age: Inquis...
1,7290700,metaplasia definition: 1. abnormal change of o...
2,5465518,Candice Net Worth. According to the report of ...
3,3100518,"Under the Base Closure Act, March AFB was down..."
4,3207764,There are a number of career opportunities for...


Feed the `data` to the application.

In [29]:
result = app.feed_df(df=data, include_id=True)

Successful documents fed: 10/10.
Batch progress: 1/1.


## Validate Vespa predictions

Get query from the small dev set to use to validate Vespa TensorFlow predictions.

In [30]:
query_text = dataset.get_queries(type="dev").iloc[0,1]
query_text = query_text.replace("'", "")

In [31]:
query_text

'why say the sky is the limit'

The code below shows the YQL expression that will be used to select the documents to be ranked.

In [32]:
"select * from sources * where ({{grammar: 'any', defaultIndex: 'text'}}userInput('{}'))".format(query_text)

"select * from sources * where ({grammar: 'any', defaultIndex: 'text'}userInput('why say the sky is the limit'))"

The function `get_vespa_prediction_and_features` will match documents using the YQL expression above and rank the documents with the rank-profile `tensorflow` that we defined in the Vespa application package.

In [33]:
def get_vespa_prediction_and_features(query_text):
    # Send query and extract hits
    hits = app.query(
                body={
                    "yql": "select * from sources * where ({{'grammar': 'any', 'defaultIndex': 'text'}}userInput('{}'));".format(query_text),
                    "ranking": "tensorflow"
                }
            ).hits
    result =[]
    # For each hit, extract the inputs to the model along with model predictions computed by Vespa
    for hit in hits:
        result.append({
            "fieldMatch(text).queryCompleteness": hit["fields"]["summaryfeatures"]["fieldMatch(text).queryCompleteness"],
            "fieldMatch(text).significance": hit["fields"]["summaryfeatures"]["fieldMatch(text).significance"],
            "nativeRank(text)": hit["fields"]["summaryfeatures"]["nativeRank(text)"],
            "vespa_prediction": hit["relevance"],             
        })
    return pd.DataFrame.from_records(result)

Inputs and vespa predictions:

In [34]:
predictions = get_vespa_prediction_and_features(query_text=query_text)
predictions

Unnamed: 0,fieldMatch(text).queryCompleteness,fieldMatch(text).significance,nativeRank(text),vespa_prediction
0,0.285714,0.199799,0.061853,0.360788
1,0.571429,0.415687,0.08694,-0.12851
2,0.428571,0.302071,0.065154,-0.240481
3,0.428571,0.302071,0.0506,-0.670632
4,0.428571,0.302071,0.049802,-0.694231
5,0.285714,0.199799,0.025552,-0.712175
6,0.428571,0.302071,0.045398,-0.82439


Compute predictions from the TensorFlow model `simpler_model` directly:

In [35]:
predictions["tf_prediction"] = predictions[
    ["fieldMatch(text).queryCompleteness", "fieldMatch(text).significance", "nativeRank(text)"]
].apply(lambda x: simpler_model.predict([x.tolist()])[0][0], axis=1)



In [36]:
predictions

Unnamed: 0,fieldMatch(text).queryCompleteness,fieldMatch(text).significance,nativeRank(text),vespa_prediction,tf_prediction
0,0.285714,0.199799,0.061853,0.360788,0.360788
1,0.571429,0.415687,0.08694,-0.12851,-0.12851
2,0.428571,0.302071,0.065154,-0.240481,-0.240481
3,0.428571,0.302071,0.0506,-0.670632,-0.670632
4,0.428571,0.302071,0.049802,-0.694231,-0.694231
5,0.285714,0.199799,0.025552,-0.712175,-0.712176
6,0.428571,0.302071,0.045398,-0.82439,-0.82439


Check that the predictions from the model deployed in Vespa are (almost) equal to the predictions obtained directly from the model.

In [37]:
from numpy.testing import assert_almost_equal

assert_almost_equal(predictions["vespa_prediction"].tolist(), predictions["tf_prediction"].tolist(), 5)

## Clean environment

In [38]:
import shutil

shutil.rmtree("simpler_keras_model") 
vespa_docker.container.stop(timeout=600)
vespa_docker.container.remove()