<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# HugeCTR Continuous Training and Inference Demo (Part I)

## Overview

In HugeCTR version 3.3, we finished the whole pipeline of parameter server, including 
1. The parameter dumping interface from training to kafka.
2. CPU cache(Redis Cluster / Hash Map / Parallel Hash Map).
3. RocksDB as a persistence storage.
4. Embedding cache update mechanism.


The purpose of this notebook is to show how to do continuous traning and inference using HugeCTR Hierarchical Parameter Server. 


## Table of Contents
-  Data Preparation
-  Data Preprocessing using Pandas
-  Wide&Deep Training Demo
-  Wide&Deep Model Inference using Python API
-  Wide&Deep Model continuous training
-  Wide&Deep Model continuous inference

## 1. Data preparation

### 1.1 Make a folder to store our data and data processing scripts:

In [1]:
!mkdir criteo_data
!mkdir criteo_script

### 1.2 Download Criteo Dataset

In [2]:
!wget https://storage.googleapis.com/criteo-cail-datasets/day_1.gz

**NOTE**: Replace `1` with a value from [0, 23] to use a different day.

During preprocessing, the amount of data, which is used to speed up the preprocessing, fill missing values, and remove the feature values that are considered rare, is further reduced.

### 1.3 Write the preprocessing the script. 

In [3]:
%%writefile preprocess.sh

#!/bin/bash

if [[ $# -lt 3 ]]; then
  echo "Usage: preprocess.sh [DATASET_NO.] [DST_DATA_DIR] [SCRIPT_TYPE] [SCRIPT_TYPE_SPECIFIC_ARGS...]"
  exit 2
fi

DST_DATA_DIR=$2

echo "Warning: existing $DST_DATA_DIR is erased"
rm -rf $DST_DATA_DIR

if [[ $3 == "nvt" ]]; then
  if [[ $# -ne 6 ]]; then
    echo "Usage: preprocess.sh [DATASET_NO.] [DST_DATA_DIR] nvt [IS_PARQUET_FORMAT] [IS_CRITEO_MODE] [IS_FEATURE_CROSSED]"
    exit 2
  fi
  echo "Preprocessing script: NVTabular"
elif [[ $3 == "perl" ]]; then
  if [[ $# -ne 4 ]]; then
    echo "Usage: preprocess.sh [DATASET_NO.] [DST_DATA_DIR] perl [NUM_SLOTS]"
    exit 2
  fi
  echo "Preprocessing script: Perl"
elif [[ $3 == "pandas" ]]; then
  if [[ $# -lt 5 ]]; then
    echo "Usage: preprocess.sh [DATASET_NO.] [DST_DATA_DIR] pandas [IS_DENSE_NORMALIZED] [IS_FEATURE_CROSSED] (FILE_LIST_LENGTH)"
    exit 2
  fi
  echo "Preprocessing script: Pandas"
else
  echo "Error: $3 is an invalid script type. Pick one from {nvt, perl, pandas}."
  exit 2
fi

SCRIPT_TYPE=$3

echo "Getting the first few examples from the uncompressed dataset..."
mkdir -p $DST_DATA_DIR/train                         && \
mkdir -p $DST_DATA_DIR/val                           && \
head -n 500000 day_$1 > $DST_DATA_DIR/day_$1_small
if [ $? -ne 0 ]; then
  echo "Warning: fallback to find original compressed data day_$1.gz..."
  echo "Decompressing day_$1.gz..."
  gzip -d -c day_$1.gz > day_$1
  if [ $? -ne 0 ]; then
    echo "Error: failed to decompress the file."
    exit 2
  fi
  head -n 500000 day_$1 > $DST_DATA_DIR/day_$1_small
  if [ $? -ne 0 ]; then
    echo "Error: day_$1 file"
    exit 2
  fi
fi

echo "Counting the number of samples in day_$1 dataset..."
total_count=$(wc -l $DST_DATA_DIR/day_$1_small)
total_count=(${total_count})
echo "The first $total_count examples will be used in day_$1 dataset."

echo "Shuffling dataset..."
shuf $DST_DATA_DIR/day_$1_small > $DST_DATA_DIR/day_$1_shuf

train_count=$(( total_count * 8 / 10))
valtest_count=$(( total_count - train_count ))
val_count=$(( valtest_count * 5 / 10 ))
test_count=$(( valtest_count - val_count  ))

split_dataset()
{
  echo "Splitting into $train_count-sample training, $val_count-sample val, and $test_count-sample test datasets..."
  head -n $train_count $DST_DATA_DIR/$1 > $DST_DATA_DIR/train/train.txt          && \
  tail -n $valtest_count $DST_DATA_DIR/$1 > $DST_DATA_DIR/val/valtest.txt        && \
  head -n $val_count $DST_DATA_DIR/val/valtest.txt > $DST_DATA_DIR/val/val.txt   && \
  tail -n $test_count $DST_DATA_DIR/val/valtest.txt > $DST_DATA_DIR/val/test.txt

  if [ $? -ne 0 ]; then
    exit 2
  fi
}

echo "Preprocessing..."
if [[ $SCRIPT_TYPE == "nvt" ]]; then
  IS_PARQUET_FORMAT=$4
  IS_CRITEO_MODE=$5
  FEATURE_CROSS_LIST_OPTION=""
  if [[ ( $IS_CRITEO_MODE -eq 0 ) && ( $6 -eq 1 ) ]]; then
    FEATURE_CROSS_LIST_OPTION="--feature_cross_list C1_C2,C3_C4"
    echo $FEATURE_CROSS_LIST_OPTION
  fi
  split_dataset day_$1_shuf
  python3 criteo_script/preprocess_nvt.py \
    --data_path $DST_DATA_DIR             \
    --out_path $DST_DATA_DIR              \
    --freq_limit 6                        \
    --device_limit_frac 0.5               \
    --device_pool_frac 0.5                \
    --out_files_per_proc 8                \
    --devices "0"                         \
    --num_io_threads 2                    \
    --parquet_format=$IS_PARQUET_FORMAT   \
    --criteo_mode=$IS_CRITEO_MODE         \
    $FEATURE_CROSS_LIST_OPTION

elif [[ $SCRIPT_TYPE == "perl" ]]; then
  NUM_SLOT=$4
  split_dataset day_$1_shuf
  perl criteo_script_legacy/preprocess.pl $DST_DATA_DIR/train/train.txt $DST_DATA_DIR/val/val.txt $DST_DATA_DIR/val/test.txt                      && \
  criteo2hugectr_legacy $NUM_SLOT $DST_DATA_DIR/train/train.txt.out $DST_DATA_DIR/train/sparse_embedding $DST_DATA_DIR/file_list.txt && \
  criteo2hugectr_legacy $NUM_SLOT $DST_DATA_DIR/val/test.txt.out $DST_DATA_DIR/val/sparse_embedding $DST_DATA_DIR/file_list_test.txt

elif [[ $SCRIPT_TYPE == "pandas" ]]; then
  python3 criteo_script/preprocess.py                 \
    --src_csv_path=$DST_DATA_DIR/day_$1_shuf          \
    --dst_csv_path=$DST_DATA_DIR/day_$1_shuf.out      \
    --normalize_dense=$4 --feature_cross=$5      &&   \
  split_dataset day_$1_shuf.out
  NUM_WIDE_KEYS=""
  if [[ $5 -ne 0 ]]; then
    NUM_WIDE_KEYS=2
  fi

  FILE_LIST_LENGTH=""
  if [[ $# -gt 5 ]]; then
    FILE_LIST_LENGTH=$6
  fi

  criteo2hugectr $DST_DATA_DIR/train/train.txt $DST_DATA_DIR/train/sparse_embedding $DST_DATA_DIR/file_list.txt $NUM_WIDE_KEYS $FILE_LIST_LENGTH && \
  criteo2hugectr $DST_DATA_DIR/val/test.txt $DST_DATA_DIR/val/sparse_embedding $DST_DATA_DIR/file_list_test.txt $NUM_WIDE_KEYS $FILE_LIST_LENGTH
fi

if [ $? -ne 0 ]; then
  exit 2
fi

echo "All done!"


Overwriting preprocess.sh


**NOTE**: Here we only read the first 500000 lines of the data to do the demo.

In [4]:
%%writefile criteo_script/preprocess.py

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import argparse
import sys
import tempfile

from six.moves import urllib
import urllib.request 

import sys
import os
import math
import time
import logging
import concurrent.futures as cf
from traceback import print_exc

import numpy as np
import pandas as pd
import sklearn.preprocessing as skp

logging.basicConfig(format='%(asctime)s %(message)s')
logging.root.setLevel(logging.NOTSET)

NUM_INTEGER_COLUMNS = 13
NUM_CATEGORICAL_COLUMNS = 26
NUM_TOTAL_COLUMNS = 1 + NUM_INTEGER_COLUMNS + NUM_CATEGORICAL_COLUMNS

MAX_NUM_WORKERS = NUM_TOTAL_COLUMNS

INT_NAN_VALUE = np.iinfo(np.int32).min
CAT_NAN_VALUE = '80000000'

def idx2key(idx):
    if idx == 0:
        return 'label'
    return 'I' + str(idx) if idx <= NUM_INTEGER_COLUMNS else 'C' + str(idx - NUM_INTEGER_COLUMNS)

def _fill_missing_features_and_split(chunk, series_list_dict):
    for cid, col in enumerate(chunk.columns):
        NAN_VALUE = INT_NAN_VALUE if cid <= NUM_INTEGER_COLUMNS else CAT_NAN_VALUE
        result_series = chunk[col].fillna(NAN_VALUE)
        series_list_dict[col].append(result_series)

def _merge_and_transform_series(src_series_list, col, dense_cols,
                                normalize_dense):
    result_series = pd.concat(src_series_list)

    if col != 'label':
        unique_value_counts = result_series.value_counts()
        unique_value_counts = unique_value_counts.loc[unique_value_counts >= 6]
        unique_value_counts = set(unique_value_counts.index.values)
        NAN_VALUE = INT_NAN_VALUE if col.startswith('I') else CAT_NAN_VALUE
        result_series = result_series.apply(
                lambda x: x if x in unique_value_counts else NAN_VALUE)

    if col == 'label' or col in dense_cols:
        result_series = result_series.astype(np.int64)
        le = skp.LabelEncoder()
        result_series = pd.DataFrame(le.fit_transform(result_series))
        if col != 'label':
            result_series = result_series + 1
    else:
        oe = skp.OrdinalEncoder(dtype=np.int64)
        result_series = pd.DataFrame(oe.fit_transform(pd.DataFrame(result_series)))
        result_series = result_series + 1


    if normalize_dense != 0:
        if col in dense_cols:
            mms = skp.MinMaxScaler(feature_range=(0,1))
            result_series = pd.DataFrame(mms.fit_transform(result_series))

    result_series.columns = [col]

    min_max = (np.int64(result_series[col].min()), np.int64(result_series[col].max()))
    if col != 'label':
        logging.info('column {} [{}, {}]'.format(col, str(min_max[0]),str(min_max[1])))

    return [result_series, min_max]

def _convert_to_string(series):
    return series.astype(str)

def _merge_columns_and_feature_cross(series_list, min_max, feature_pairs,
                                     feature_cross):
    name_to_series = dict()
    for series in series_list:
        name_to_series[series.columns[0]] = series.iloc[:,0]
    df = pd.DataFrame(name_to_series)
    cols = [idx2key(idx) for idx in range(0, NUM_TOTAL_COLUMNS)]
    df = df.reindex(columns=cols)

    offset = np.int64(0)
    for col in cols:
        if col != 'label' and col.startswith('I') == False:
            df[col] += offset
            logging.info('column {} offset {}'.format(col, str(offset)))
            offset += min_max[col][1]

    if feature_cross != 0:
        for idx, pair in enumerate(feature_pairs):
            col0 = pair[0]
            col1 = pair[1]

            col1_width = int(min_max[col1][1] - min_max[col1][0] + 1)

            crossed_column_series = df[col0] * col1_width + df[col1]
            oe = skp.OrdinalEncoder(dtype=np.int64)
            crossed_column_series = pd.DataFrame(oe.fit_transform(pd.DataFrame(crossed_column_series)))
            crossed_column_series = crossed_column_series + 1

            crossed_column = col0 + '_' + col1
            df.insert(NUM_INTEGER_COLUMNS + 1 + idx, crossed_column, crossed_column_series)
            crossed_column_max_val = np.int64(df[crossed_column].max())
            logging.info('column {} [{}, {}]'.format(
                crossed_column,
                str(df[crossed_column].min()),
                str(crossed_column_max_val)))
            df[crossed_column] += offset
            logging.info('column {} offset {}'.format(crossed_column, str(offset)))
            offset += crossed_column_max_val

    return df

def _wait_futures_and_reset(futures):
    for future in futures:
        result = future.result()
        if result:
            print(result)
    futures = list()

def _process_chunks(executor, chunks_to_process, op, *argv):
    futures = list()
    for chunk in chunks_to_process:
        argv_list = list(argv)
        argv_list.insert(0, chunk)
        new_argv = tuple(argv_list)
        future = executor.submit(op, *new_argv)
        futures.append(future)
    _wait_futures_and_reset(futures)

def preprocess(src_txt_name, dst_txt_name, normalize_dense, feature_cross):
    cols = [idx2key(idx) for idx in range(0, NUM_TOTAL_COLUMNS)]
    series_list_dict = dict()

    with cf.ThreadPoolExecutor(max_workers=MAX_NUM_WORKERS) as executor:
        logging.info('read a CSV file')
        reader = pd.read_csv(src_txt_name, sep='\t',
                             names=cols,
                             chunksize=131072)

        logging.info('_fill_missing_features_and_split')
        for col in cols:
            series_list_dict[col] = list()
        _process_chunks(executor, reader, _fill_missing_features_and_split,
                        series_list_dict)

    with cf.ProcessPoolExecutor(max_workers=MAX_NUM_WORKERS) as executor:
        logging.info('_merge_and_transform_series')
        futures = list()
        dense_cols = [idx2key(idx+1) for idx in range(NUM_INTEGER_COLUMNS)]
        dst_series_list = list()
        min_max = dict()
        for col, src_series_list in series_list_dict.items():
            future = executor.submit(_merge_and_transform_series,
                                     src_series_list, col, dense_cols,
                                     normalize_dense)
            futures.append(future)

        for future in futures:
            col = None
            for idx, ret in enumerate(future.result()):
                try:
                    if idx == 0:
                        col = ret.columns[0]
                        dst_series_list.append(ret)
                    else:
                        min_max[col] = ret
                except:
                    print_exc()
        futures = list()

        logging.info('_merge_columns_and_feature_cross')
        feature_pairs = [('C1', 'C2'), ('C3', 'C4')]
        df = _merge_columns_and_feature_cross(dst_series_list, min_max, feature_pairs,
                                              feature_cross)

        
        logging.info('_convert_to_string')
        futures = dict()
        for col in cols:
            future = executor.submit(_convert_to_string, df[col])
            futures[col] = future
        if feature_cross != 0:
            for pair in feature_pairs:
                col = pair[0] + '_' + pair[1]
                future = executor.submit(_convert_to_string, df[col])
                futures[col] = future

        logging.info('_store_to_df')
        for col, future in futures.items():
            ret = future.result()
            try:
                df[col] = ret
            except:
                print_exc()
        futures = dict()

        logging.info('write to a CSV file')
        df.to_csv(dst_txt_name, sep=' ', header=False, index=False)

        logging.info('done!')


if __name__ == '__main__':
    arg_parser = argparse.ArgumentParser(description='Preprocssing Criteo Dataset')

    arg_parser.add_argument('--src_csv_path', type=str, required=True)
    arg_parser.add_argument('--dst_csv_path', type=str, required=True)
    arg_parser.add_argument('--normalize_dense', type=int, default=1)
    arg_parser.add_argument('--feature_cross', type=int, default=1)

    args = arg_parser.parse_args()

    src_csv_path = args.src_csv_path
    dst_csv_path = args.dst_csv_path

    normalize_dense = args.normalize_dense
    feature_cross = args.feature_cross

    if os.path.exists(src_csv_path) == False:
        sys.exit('ERROR: the file \'{}\' doesn\'t exist'.format(src_csv_path))

    if os.path.exists(dst_csv_path) == True:
        sys.exit('ERROR: the file \'{}\' exists'.format(dst_csv_path))

    preprocess(src_csv_path, dst_csv_path, normalize_dense, feature_cross)




Overwriting criteo_script/preprocess.py


### 1.4 Run the preprocess script

In [5]:
!bash preprocess.sh 0 criteo_data pandas 1 1 1

**IMPORTANT NOTES**: 

Arguments may vary depend on your setting:
- The first argument represents the dataset postfix.  For instance, if `day_1` is used, the postfix is `1`.
- The second argument, `criteo_data`, is where the preprocessed data is stored.

### 1.5 Generate data sample for inference 

In [14]:
import pandas as pd
import numpy as np
df = pd.read_table("criteo_data/train/train.txt", header = None, sep= ' ', \
                   names = ['label'] + ['I'+str(i) for i in range(1, 14)] + \
                   ['C1_C2', 'C3_C4'] + ['C'+str(i) for i in range(1, 27)])[:5]
left = df.iloc[:,:14].astype(np.float32)
right = df.iloc[:, 14:].astype(np.int64)
merged = pd.concat([left, right], axis = 1)
merged.to_csv("infer_data.csv", index = False)

## 2. Start the Kafka Broker

**Please refer to the README to start the Kafka Broker properly.**

## 3. Wide&Deep Model Demo

In [8]:
!rm -r *model

In [9]:
%%writefile wdl_demo.py
import hugectr
from mpi4py import MPI
solver = hugectr.CreateSolver(model_name = "wdl",
                              max_eval_batches = 5000,
                              batchsize_eval = 1024,
                              batchsize = 1024,
                              lr = 0.001,
                              vvgpu = [[0]],
                              i64_input_key = False,
                              use_mixed_precision = False,
                              repeat_dataset = False,
                              use_cuda_graph = True,
                              kafka_brockers = "10.23.137.25:9093") #Make sure this is consistent with your Kafka broker.)
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Norm,
                          source = ["criteo_data/file_list."+str(i)+".txt" for i in range(2)],
                          keyset = ["criteo_data/file_list."+str(i)+".keyset" for i in range(2)],
                          eval_source = "criteo_data/file_list.2.txt",
                          check_type = hugectr.Check_t.Sum)
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam)
hc_config = hugectr.CreateHMemCache(2, 0.5, 0)
etc = hugectr.CreateETC(ps_types = [hugectr.TrainPSType_t.Staged, hugectr.TrainPSType_t.Cached],\
                        sparse_models = ["./wdl_0_sparse_model", "./wdl_1_sparse_model"],\
                        local_paths = ["./"], hmem_cache_configs = [hc_config])
model = hugectr.Model(solver, reader, optimizer, etc)
model.add(hugectr.Input(label_dim = 1, label_name = "label",
                        dense_dim = 13, dense_name = "dense",
                        data_reader_sparse_param_array = 
                        [hugectr.DataReaderSparseParam("wide_data", 2, True, 1),
                        hugectr.DataReaderSparseParam("deep_data", 1, True, 26)]))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 23,
                            embedding_vec_size = 1,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding0",
                            bottom_name = "wide_data",
                            optimizer = optimizer))
model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash, 
                            workspace_size_per_gpu_in_mb = 358,
                            embedding_vec_size = 16,
                            combiner = "sum",
                            sparse_embedding_name = "sparse_embedding1",
                            bottom_name = "deep_data",
                            optimizer = optimizer))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding1"],
                            top_names = ["reshape1"],
                            leading_dim=416))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                            bottom_names = ["sparse_embedding0"],
                            top_names = ["reshape2"],
                            leading_dim=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                            bottom_names = ["reshape1", "dense"], top_names = ["concat1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["concat1"],
                            top_names = ["fc1"],
                            num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc1"],
                            top_names = ["relu1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu1"],
                            top_names = ["dropout1"],
                            dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout1"],
                            top_names = ["fc2"],
                            num_output=1024))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                            bottom_names = ["fc2"],
                            top_names = ["relu2"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                            bottom_names = ["relu2"],
                            top_names = ["dropout2"],
                            dropout_rate=0.5))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                            bottom_names = ["dropout2"],
                            top_names = ["fc3"],
                            num_output=1))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Add,
                            bottom_names = ["fc3", "reshape2"],
                            top_names = ["add1"]))
model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                            bottom_names = ["add1", "label"],
                            top_names = ["loss"]))
model.compile()
model.summary()
model.graph_to_json(graph_config_file = "wdl.json")
#model.save_params_to_files("wdl")
model.fit(num_epochs = 1, display = 500, eval_interval = 1000)

model.set_source(source = ["criteo_data/file_list."+str(i)+".txt" for i in range(3, 5)], \
                 keyset = ["criteo_data/file_list."+str(i)+".keyset" for i in range(3, 5)], \
                 eval_source = "criteo_data/file_list.9.txt")

model.save_params_to_files("wdl")

Overwriting wdl_demo.py


In [10]:
!python wdl_demo.py

[HUGECTR][03:34:23][INFO][RANK0]: Empty embedding, trained table will be stored in ./wdl_0_sparse_model
[HUGECTR][03:34:23][INFO][RANK0]: Empty embedding, trained table will be stored in ./wdl_1_sparse_model
HugeCTR Version: 3.2
[HUGECTR][03:34:23][INFO][RANK0]: Initialize model: wdl
[HUGECTR][03:34:23][INFO][RANK0]: Global seed is 337017754
[HUGECTR][03:34:23][INFO][RANK0]: Device to NUMA mapping:
  GPU 0 ->  node 0

[HUGECTR][03:34:25][INFO][RANK0]: Start all2all warmup
[HUGECTR][03:34:25][INFO][RANK0]: End all2all warmup
[HUGECTR][03:34:25][INFO][RANK0]: Using All-reduce algorithm: NCCL
[HUGECTR][03:34:25][INFO][RANK0]: Device 0: Tesla V100-SXM2-32GB
[HUGECTR][03:34:25][DEBUG][RANK0]: Creating Kafka lifetime service.
[HUGECTR][03:34:25][INFO][RANK0]: num of DataReader workers: 12
[HUGECTR][03:34:25][INFO][RANK0]: max_vocabulary_size_per_gpu_=6029312
[HUGECTR][03:34:25][INFO][RANK0]: max_vocabulary_size_per_gpu_=5865472
[HUGECTR][03:34:25][INFO][RANK0]: Graph analysis to resolve tens

## 4. WDL Inference

### 4.1 Inference using HugeCTR python API

In [11]:
#Create a folder for RocksDB
!mkdir /wdl_infer
!mkdir /wdl_infer/rocksdb

mkdir: cannot create directory ‘/wdl_infer’: File exists
mkdir: cannot create directory ‘/wdl_infer/rocksdb’: File exists


**Please make sure you have started Redis cluster following the README before you start doing inference.**

In [12]:
%%writefile 'wdl_predict.py'
from hugectr.inference import InferenceParams, CreateInferenceSession
import hugectr
import pandas as pd
import numpy as np
import sys
from mpi4py import MPI
def wdl_inference(model_name='wdl', network_file='wdl.json', dense_file='wdl_dense_0.model', \
                  embedding_file_list=['wdl_0_sparse_model', 'wdl_1_sparse_model'], data_file='infer_data.csv',\
                  enable_cache=False, rocksdb_path=""):
    CATEGORICAL_COLUMNS=["C1_C2","C3_C4"] + ["C" + str(x) for x in range(1, 27)]
    CONTINUOUS_COLUMNS=["I" + str(x) for x in range(1, 14)]
    LABEL_COLUMNS = ['label']
    test_df=pd.read_csv(data_file,sep=',')
    config_file = network_file
    row_ptrs = list(range(0, 11, 2)) + list(range(0, 131))
    dense_features =  list(test_df[CONTINUOUS_COLUMNS].values.flatten())
    test_df[CATEGORICAL_COLUMNS].astype(np.int64)
    embedding_columns = list((test_df[CATEGORICAL_COLUMNS]).values.flatten())

    redisdatabase = hugectr.inference.DistributedDatabaseParams(
        hugectr.DatabaseType_t.redis_cluster,
        address="127.0.0.1:7000,127.0.0.1:7001,127.0.0.1:7002",
        initial_cache_rate=0.2)
    rocksdbdatabase = hugectr.inference.PersistentDatabaseParams(
        hugectr.DatabaseType_t.rocks_db,
        path="/wdl_infer/rocksdb/")
    
    # create parameter server, embedding cache and inference session
    inference_params = InferenceParams(model_name = model_name,
                                max_batchsize = 64,
                                hit_rate_threshold = 0.5,
                                dense_model_file = dense_file,
                                sparse_model_files = embedding_file_list,
                                device_id = 0,
                                use_gpu_embedding_cache = enable_cache,
                                cache_size_percentage = 0.9,
                                i64_input_key = True,
                                use_mixed_precision = False,
                                volatile_db=redisdatabase,
                                persistent_db=rocksdbdatabase)
    inference_session = CreateInferenceSession(config_file, inference_params)
    output = inference_session.predict(dense_features, embedding_columns, row_ptrs)
    print("WDL multi-embedding table inference result is {}".format(output))

wdl_inference()

Overwriting wdl_predict.py


In [15]:
!python wdl_predict.py

[HUGECTR][03:36:30][INFO][RANK0]: default_emb_vec_value is not specified using default: 0.000000
[HUGECTR][03:36:30][INFO][RANK0]: default_emb_vec_value is not specified using default: 0.000000
[HUGECTR][03:36:30][INFO][RANK0]: Creating RedisCluster backend...
[HUGECTR][03:36:30][INFO][RANK0]: Connecting to Redis cluster via 127.0.0.1:7000 ...
[HUGECTR][03:36:30][INFO][RANK0]: Connected to Redis database!
[HUGECTR][03:36:30][INFO][RANK0]: Creating RocksDB backend...
[HUGECTR][03:36:30][INFO][RANK0]: Connecting to RocksDB database...
[HUGECTR][03:36:30][INFO][RANK0]: RocksDB /wdl_infer/rocksdb/, found column family "default".
[HUGECTR][03:36:30][INFO][RANK0]: RocksDB /wdl_infer/rocksdb/, found column family "hctr_et.wdl.sparse_embedding0".
[HUGECTR][03:36:30][INFO][RANK0]: RocksDB /wdl_infer/rocksdb/, found column family "hctr_et.wdl.sparse_embedding1".
[HUGECTR][03:36:31][INFO][RANK0]: Connected to RocksDB database!
[HUGECTR][03:36:31][DEBUG][RANK0]: Redis partition hctr_et.wdl.sparse_

### 4.2 Inference using Triton

**Please refer to the [Triton_Inference.ipynb](./Triton_Inference.ipynb) notebook to start Triton and do the inference.**

## 5. Continue Training WDL Model

In [16]:
%%writefile wdl_continue.py
import hugectr
from mpi4py import MPI
solver = hugectr.CreateSolver(model_name = "wdl",
                              max_eval_batches = 5000,
                              batchsize_eval = 1024,
                              batchsize = 1024,
                              lr = 0.001,
                              vvgpu = [[0]],
                              i64_input_key = False,
                              use_mixed_precision = False,
                              repeat_dataset = False,
                              use_cuda_graph = True,
                              kafka_brockers = "10.23.137.25:9093")
reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Norm,
                          source = ["criteo_data/file_list."+str(i)+".txt" for i in range(6, 9)],
                          keyset = ["criteo_data/file_list."+str(i)+".keyset" for i in range(6, 9)],
                          eval_source = "criteo_data/file_list.9.txt",
                          check_type = hugectr.Check_t.Sum)
optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam)
hc_config = hugectr.CreateHMemCache(2, 0.5, 0)
etc = hugectr.CreateETC(ps_types = [hugectr.TrainPSType_t.Staged, hugectr.TrainPSType_t.Cached],\
                        sparse_models = ["./wdl_0_sparse_model", "./wdl_1_sparse_model"],\
                        local_paths = ["./"], hmem_cache_configs = [hc_config])
model = hugectr.Model(solver, reader, optimizer, etc)
model.construct_from_json(graph_config_file = "wdl.json", include_dense_network = True)
model.compile()
model.load_dense_weights("wdl_dense_0_model")
model.load_dense_optimizer_states("dcn_opt_dense_1000.model")

model.summary()
model.graph_to_json(graph_config_file = "wdl.json")
model.fit(num_epochs = 1, display = 500, eval_interval = 1000)
model.dump_incremental_model_2kafka()
model.save_params_to_files("wdl_new")

Overwriting wdl_continue.py


In [17]:
!python wdl_continue.py

[HUGECTR][03:37:25][INFO][RANK0]: Use existing embedding: ./wdl_0_sparse_model
[HUGECTR][03:37:25][INFO][RANK0]: Use existing embedding: ./wdl_1_sparse_model
HugeCTR Version: 3.2
[HUGECTR][03:37:25][INFO][RANK0]: Initialize model: wdl
[HUGECTR][03:37:25][INFO][RANK0]: Global seed is 2083265859
[HUGECTR][03:37:25][INFO][RANK0]: Device to NUMA mapping:
  GPU 0 ->  node 0

[HUGECTR][03:37:27][INFO][RANK0]: Start all2all warmup
[HUGECTR][03:37:27][INFO][RANK0]: End all2all warmup
[HUGECTR][03:37:27][INFO][RANK0]: Using All-reduce algorithm: NCCL
[HUGECTR][03:37:27][INFO][RANK0]: Device 0: Tesla V100-SXM2-32GB
[HUGECTR][03:37:27][DEBUG][RANK0]: Creating Kafka lifetime service.
[HUGECTR][03:37:27][INFO][RANK0]: num of DataReader workers: 12
[HUGECTR][03:37:27][INFO][RANK0]: max_num_frequent_categories is not specified using default: 1
[HUGECTR][03:37:27][INFO][RANK0]: max_num_infrequent_samples is not specified using default: -1
[HUGECTR][03:37:27][INFO][RANK0]: p_dup_max is not specified us

## 6. Inference with new model

### 6.1 Continuous inference using Python API

In [18]:
!python wdl_predict.py

[HUGECTR][03:38:09][INFO][RANK0]: default_emb_vec_value is not specified using default: 0.000000
[HUGECTR][03:38:09][INFO][RANK0]: default_emb_vec_value is not specified using default: 0.000000
[HUGECTR][03:38:09][INFO][RANK0]: Creating RedisCluster backend...
[HUGECTR][03:38:09][INFO][RANK0]: Connecting to Redis cluster via 127.0.0.1:7000 ...
[HUGECTR][03:38:09][INFO][RANK0]: Connected to Redis database!
[HUGECTR][03:38:09][INFO][RANK0]: Creating RocksDB backend...
[HUGECTR][03:38:09][INFO][RANK0]: Connecting to RocksDB database...
[HUGECTR][03:38:09][INFO][RANK0]: RocksDB /wdl_infer/rocksdb/, found column family "default".
[HUGECTR][03:38:09][INFO][RANK0]: RocksDB /wdl_infer/rocksdb/, found column family "hctr_et.wdl.sparse_embedding0".
[HUGECTR][03:38:09][INFO][RANK0]: RocksDB /wdl_infer/rocksdb/, found column family "hctr_et.wdl.sparse_embedding1".
[HUGECTR][03:38:09][INFO][RANK0]: Connected to RocksDB database!
[HUGECTR][03:38:09][DEBUG][RANK0]: Redis partition hctr_et.wdl.sparse_

### 6.2 Continuous inference using Triton

**Please refer to the [Triton_Inference.ipynb](./Triton_Inference.ipynb) notebook to do the inference.**