In [1]:
# Copyright 2023 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

# Each user is responsible for checking the content of datasets and the
# applicable licenses and determining if suitable for the intended use.

In [2]:
from IPython.display import display

def is_notebook():
    try:
        shell = get_ipython().__class__.__name__
        return shell in ('ZMQInteractiveShell',)  # Jupyter Notebook or JupyterLab
    except NameError:
        return False

<img src="https://developer.download.nvidia.com/notebooks/dlsw-notebooks/merlin_models_01-getting-started/nvidia_logo.png" style="width: 90px; float: right;">

# Getting Started with Merlin Models: Develop a Model for MovieLens

This notebook is created using the latest stable [merlin-tensorflow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow/tags) container. 

## Overview

[Merlin Models](https://github.com/NVIDIA-Merlin/models/) is a library for training recommender models. Merlin Models let Data Scientists and ML Engineers easily train standard RecSys models on their own dataset, getting GPU-accelerated models with best practices baked into the library. This will also let researchers to build custom models by incorporating standard components of deep learning recommender models, and then benchmark their new models on example offline datasets. Merlin Models is part of the [Merlin open source framework](https://developer.nvidia.com/nvidia-merlin).

Core features are:
- Many different recommender system architectures (tabular, two-tower, sequential) or tasks (binary, multi-class classification, multi-task)
- Flexible APIs targeted to both production and research
- Deep integration with NVIDIA Merlin platform, including NVTabular for ETL and Merlin Systems model serving


### Learning objectives

- Training [Facebook's DLRM model](https://arxiv.org/pdf/1906.00091.pdf) very easily with our high-level API.
- Understanding Merlin Models high-level API

## Downloading and preparing the dataset

In [3]:
import os
import merlin.models.tf as mm

from merlin.datasets.entertainment import get_movielens

2025-06-22 14:10:34.392461: I tensorflow/core/platform/cpu_feature_guard.cc:183] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.




  warn(f"PyTorch dtype mappings did not load successfully due to an error: {exc.msg}")


[SOK INFO] Import /usr/local/lib/python3.10/dist-packages/merlin_sok-2.0.0-py3.10-linux-x86_64.egg/sparse_operation_kit/lib/libsparse_operation_kit.so
[SOK INFO] Initialize finished, communication tool: horovod


2025-06-22 14:10:38.790682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1638] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1857 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1650 SUPER, pci bus id: 0000:00:10.0, compute capability: 7.5
  from .autonotebook import tqdm as notebook_tqdm


In [4]:
import tensorflow as tf
import json

We provide the `get_movielens()` function as a convenience to download the dataset, perform simple preprocessing, and split the data into training and validation datasets.

In [5]:
input_path = os.environ.get("INPUT_DATA_DIR", os.path.expanduser("~/merlin-models-data/movielens/"))
train, valid = get_movielens(variant="ml-1m", path=input_path)

## Training the DLRM Model with Merlin Models

We define the DLRM model, whose prediction task is a binary classification. From the `schema`, the categorical features are identified (and embedded) and the target columns are also automatically inferred, because of the schema tags. We talk more about the schema in the next [example notebook (02)](02-Merlin-Models-and-NVTabular-integration.ipynb),

In [6]:
if is_notebook():
    display(train.to_ddf().compute())
else:
    print(train.to_ddf().compute())

Unnamed: 0,title,gender,age,occupation,zipcode,TE_userId_rating,TE_age_rating,TE_gender_rating,TE_occupation_rating,TE_zipcode_rating,TE_movieId_rating,movieId,userId,genres,rating_binary,rating
0,569,3,8,16,310,0.503433,2.660991,-0.583709,1.784611,-0.929644,0.527114,569,2456,[18],1,5.0
1,305,3,4,5,79,0.250566,0.516771,-0.499787,0.342274,0.581454,0.166685,306,695,"[3, 8]",1,5.0
2,472,4,3,4,844,0.040648,-0.513113,1.743980,-0.760995,-0.151567,-0.436084,474,871,[4],0,3.0
3,805,4,6,10,553,0.619882,0.790233,1.665045,-0.179553,0.624958,-1.016435,800,715,"[3, 10, 11]",1,5.0
4,1412,3,3,8,34,-0.052683,-0.498925,-0.499787,-1.374803,-0.017303,-0.469370,1416,4,"[4, 11]",1,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
800163,605,4,3,8,209,0.486966,-0.545133,1.825996,-1.503250,1.101171,0.824894,606,453,"[10, 6]",0,2.0
800164,1382,3,6,7,690,-0.442212,0.869365,-0.583709,0.599160,-0.531175,-0.293844,1380,468,"[11, 7, 6]",0,2.0
800165,1290,3,4,4,836,0.316336,0.529982,-0.606078,-0.752541,0.317677,0.225234,1293,856,"[3, 4]",1,4.0
800166,1640,3,3,4,234,0.561170,-0.513113,-0.561617,-0.760995,0.725950,1.137685,1639,117,[20],1,5.0


In [7]:
if is_notebook():
    display(valid.to_ddf().compute())
else:
    print(valid.to_ddf().compute())

Unnamed: 0,title,gender,age,occupation,zipcode,TE_userId_rating,TE_age_rating,TE_gender_rating,TE_occupation_rating,TE_zipcode_rating,TE_movieId_rating,movieId,userId,genres,rating_binary,rating
0,48,3,8,20,3300,0.379800,2.660991,-0.583709,3.598207,0.419727,0.484735,48,5718,"[14, 12, 3]",1,4.0
1,1157,3,3,7,167,-0.696956,-0.498925,-0.499787,0.601167,-0.234501,-1.028562,1159,393,"[11, 6]",0,3.0
2,485,3,4,10,188,-0.072101,0.531540,-0.561617,-0.069437,-0.485579,0.227166,485,1197,"[4, 8]",0,3.0
3,137,4,3,8,1111,0.094320,-0.563995,1.665045,-1.423207,0.089484,0.865244,137,1058,[7],0,3.0
4,196,3,5,3,33,-1.336400,-1.043748,-0.499787,-0.711312,-0.916885,-1.370038,196,1008,"[9, 12, 3, 17, 7]",0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200036,273,3,5,3,25,-1.442799,-1.043748,-0.499787,-0.711312,-0.540841,-0.478949,273,619,"[5, 6]",0,3.0
200037,1408,3,4,6,220,-1.491491,0.516771,-0.499787,-0.039889,-0.851700,0.154842,1410,223,[18],1,4.0
200038,638,4,4,5,12,1.325651,0.522950,1.825996,0.247993,-0.810668,-0.070742,638,818,"[3, 4, 8]",1,4.0
200039,181,3,5,9,526,-0.764241,-1.043748,-0.499787,1.279314,-0.903698,0.108854,181,316,"[5, 6]",0,3.0


In [8]:
# Ignoring the rating regression target column, to keep only the rating_binary target column for prediction
schema = train.schema.without(['rating'])

In [9]:
model = mm.DLRMModel(
    schema,
    embedding_dim=64,
    bottom_block=mm.MLPBlock([128, 64]),
    top_block=mm.MLPBlock([128, 64, 32]),
    prediction_tasks=mm.OutputBlock(schema),
)

In [10]:
k = 10
topk_metrics = [
    mm.RecallAt(k),
    mm.PrecisionAt(k),
    mm.NDCGAt(k),
    mm.MRRAt(k),
    mm.AvgPrecisionAt(k),
]

classification_metrics = [
    tf.keras.metrics.BinaryAccuracy(),
    tf.keras.metrics.Precision(),
    tf.keras.metrics.Recall(),
    tf.keras.metrics.AUC(),
]

# TODO: Read about this.
# Usage of the TopKMetricsAggregator helper:
# topk_agg = mm.TopKMetricsAggregator(topk_metrics)

# # New optimizaers api. You can use any of https://www.tensorflow.org/api_docs/python/tf/keras/optimizers#classes
# optimizer=tf.keras.optimizers.Adam(
#     learning_rate=0.001,
#     beta_1=0.9,
#     beta_2=0.999,
#     epsilon=1e-07,
#     amsgrad=False,
#     weight_decay=None,
#     clipnorm=None,
#     clipvalue=None,
#     global_clipnorm=None,
#     use_ema=False,
#     ema_momentum=0.99,
#     ema_overwrite_frequency=None
# )

# Legacy optimizaers api. You can use any of https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/legacy#classes
optimizer=tf.keras.optimizers.legacy.Adam(
    learning_rate=0.001,
    beta_1=0.9,
    beta_2=0.999,
    epsilon=1e-07,
    amsgrad=False,
    decay=0
)

model.compile(
    optimizer=optimizer,    
    # loss=tf.keras.losses.BinaryCrossentropy(), #  Default "loss" is "None". You can use any of https://www.tensorflow.org/api_docs/python/tf/keras/losses#classes
    metrics=topk_metrics + classification_metrics,
)

Next, we train the model.

In [11]:
print("--- Optimizer config ---")
optimizer_config = model.optimizer.get_config()
for key, value in optimizer_config.items():
    print(f"{key}: {value}")

--- Optimizer config ---
name: Adam
learning_rate: 0.001
decay: 0
beta_1: 0.9
beta_2: 0.999
epsilon: 1e-07
amsgrad: False


In [12]:
batch_size=1024
model.fit(train, batch_size=batch_size, epochs=1)

2025-06-22 14:10:43.189601: I tensorflow/core/common_runtime/executor.cc:1209] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]




<keras.callbacks.History at 0x7f4514aaa620>

We evaluate the model...

In [13]:
metrics = model.evaluate(valid, batch_size=batch_size, return_dict=True)

2025-06-22 14:11:04.106902: I tensorflow/core/common_runtime/executor.cc:1209] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]




... and check the evaluation metrics. As there are two columns tagged as target in the schema (`rating_binary` and `rating`), the model has two heads (multi-task learning), one for binary classification and the other for regression.  
You can see from the list below that default metrics are provided -- Precision, Recall, Accuracy and AUC for binary classification and `RMSE` for regression tasks. You can also provide your own metrics in `model.compile()`.

In [14]:
print(json.dumps(metrics, indent=2))

{
  "loss": 0.5224999785423279,
  "binary_accuracy": 0.7363390326499939,
  "precision": 0.7464864253997803,
  "recall": 0.8208440542221069,
  "auc": 0.8069457411766052,
  "recall_at_10": 0.9770406484603882,
  "precision_at_10": 0.9770406484603882,
  "ndcg_at_10": 0.9803875684738159,
  "mrr_at_10": 0.9948979616165161,
  "map_at_10": 0.9660817980766296,
  "regularization_loss": 0.0,
  "loss_batch": 0.5332318544387817
}


## Export Model

### Model Repository

Implement the structure Triton Inference Server requires for TensorFlow [SavedModel](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_repository.html#tensorflow-models) format.

### Model Configuration

We need to let Triton Inference Server know what backend we used in our model, to be able to serve the model.

In our case, we used the [TensorFlow backend](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/tensorflow_backend/README.html#platform) (i.e. `tensorflow_backend`).

There are two options to indicate that.

- Name the model directory as `<model>.<backend>`.
- Create a file called [config.pbtxt](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html), and add a field to it called [platform](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/tensorflow_backend/README.html#platform) with the value of the `<backend>`.

In this implementation we used the first option.

#### Note about the `config.pbtxt` content:

Creating and configuring `config.pbtxt` is a must, unless you use the [Auto-Complete Model Configuration](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/model_configuration.html#auto-generated-model-configuration) ability of Triton to infere automatically the `config.pbtxt`, which is what we did in this implementation.



In [15]:
MODEL_NAME="dlrm"
MODEL_BACKEND="tensorflow_backend"
MODEL_VERSION="1"
DATA_FOLDER = os.environ.get("DATA_FOLDER", "/workspace/data/")
model.save(os.path.join(DATA_FOLDER, f"{MODEL_NAME}.{MODEL_BACKEND}", MODEL_VERSION, "model.savedmodel"))

2025-06-22 14:11:10.587430: I tensorflow/core/common_runtime/executor.cc:1209] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs_8' with dtype int32 and shape [?]
	 [[{{node inputs_8}}]]
2025-06-22 14:11:10.652407: I tensorflow/core/common_runtime/executor.cc:1209] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs_8' with dtype int32 and shape [?]
	 [[{{node inputs_8}}]]
2025-06-22 14:11:11.048944: I tensorflow/core/common_runtime/executor.cc:1209] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs_8' with dtype int32 and shape [?]
	 [[{{node inputs_8}}]]
2025-06-22 14:11

INFO:tensorflow:Assets written to: /workspace/data/dlrm.tensorflow_backend/1/model.savedmodel/assets


INFO:tensorflow:Assets written to: /workspace/data/dlrm.tensorflow_backend/1/model.savedmodel/assets


## [Deploy The Model Using Triton Inference Server](https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/TensorFlow/README.md)

Run the following command:

```
tritonserver --model-repository=/workspace/data --backend-config=tensorflow,allow-soft-placement=true
```

> You can use the [--backend-config=tensorflow,allow-soft-placement=true](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/tensorflow_backend/README.html#backend-config-tensorflow-allow-soft-placement-boolean) option to instruct TensorFlow to use CPU implementation of an operation when a GPU implementation is not available.

This will serve the model, listening on the following ports:

```
I0712 16:37:18.269464 128 grpc_server.cc:4587] Started GRPCInferenceService at 0.0.0.0:8001
I0712 16:37:18.269956 128 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000
I0712 16:37:18.311686 128 http_server.cc:178] Started Metrics Service at 0.0.0.0:80
```

### Test Triton

Test the [metrics endpoint](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/metrics.html)

```
curl localhost:8002/metrics
```
02

## Next steps

In the next example notebooks, we will show how the integration with NVTabular and how to explore different recommender models.