Skip to content

Issue Mixtral 8x7b failed to load preprocessing model.  #525

@christian-ci

Description

@christian-ci

System Info

  • x86_64
  • NVIDIA L4 and NVIDIA A10 same issue
  • Built TensorRT-LLM backend using:
DOCKER_BUILDKIT=1 docker build -t triton_trt_llm -f dockerfile/Dockerfile.trt_llm_backend .

with Commit 4d399bc75426263be9b31b66d42e4db81b73b6f7

  • Mixtral 8x7b Instruct Quant and built Engine:
 NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.4

FP8 engine and loads model but just not the preprocessing.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Using the HF Mixtral Tokenizer downloaded from the repo itself
Also FP8 Engine that loads. The problem seems to be with just the Mixtral tokenizer and not with the mistral one. I can run Mistral 7b but not 8x7b.

Error:

I0708 17:43:37.351682 111 pb_stub.cc:385] "Failed to initialize Python stub: TypeError: 'NoneType' object is not subscriptable\n\nAt:\n  /app/repo-struct/preprocessing/1/model.py(72): initialize\n"
E0708 17:43:37.898915 111 backend_model.cc:692] "ERROR: Failed to create instance: TypeError: 'NoneType' object is not subscriptable\n\nAt:\n  /app/repo-struct/preprocessing/1/model.py(72): initialize\n"
E0708 17:43:37.899027 111 model_lifecycle.cc:641] "failed to load 'preprocessing' version 1: Internal: TypeError: 'NoneType' object is not subscriptable\n\nAt:\n  /app/repo-struct/preprocessing/1/model.py(72): initialize\n"
I0708 17:43:37.899054 111 model_lifecycle.cc:776] "failed to load 'preprocessing'"

Info debug:

Initializing model with arguments:
{'model_config': '{"name":"preprocessing","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":64,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false},{"name":"REQUEST_OUTPUT_LEN","data_type":"TYPE_INT32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false},{"name":"BAD_WORDS_DICT","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"STOP_WORDS_DICT","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"EMBEDDING_BIAS_WORDS","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"EMBEDDING_BIAS_WEIGHTS","data_type":"TYPE_FP32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"END_ID","data_type":"TYPE_INT32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"PAD_ID","data_type":"TYPE_INT32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true}],"output":[{"name":"INPUT_ID","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"REQUEST_INPUT_LEN","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false},{"name":"BAD_WORDS_IDS","data_type":"TYPE_INT32","dims":[2,-1],"label_filename":"","is_shape_tensor":false},{"name":"STOP_WORDS_IDS","data_type":"TYPE_INT32","dims":[2,-1],"label_filename":"","is_shape_tensor":false},{"name":"EMBEDDING_BIAS","data_type":"TYPE_FP32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"REQUEST_OUTPUT_LEN","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"OUT_END_ID","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"OUT_PAD_ID","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"instance_group":[{"name":"preprocessing_0","kind":"KIND_CPU","count":1,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"tokenizer_dir":{"string_value":"mistralai/Mixtral-8x7B-Instruct-v0.1"},"add_special_tokens":{"string_value":"false"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': 'preprocessing_0_0', 'model_instance_device_id': '0', 'model_repository': '/app/repo-struct/preprocessing', 'model_version': '1', 'model_name': 'preprocessing'}
Model config loaded:
{'name': 'preprocessing', 'platform': '', 'backend': 'python', 'runtime': '', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 64, 'input': [{'name': 'QUERY', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': False}, {'name': 'REQUEST_OUTPUT_LEN', 'data_type': 'TYPE_INT32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': False}, {'name': 'BAD_WORDS_DICT', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'STOP_WORDS_DICT', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'EMBEDDING_BIAS_WORDS', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'EMBEDDING_BIAS_WEIGHTS', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'END_ID', 'data_type': 'TYPE_INT32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'PAD_ID', 'data_type': 'TYPE_INT32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}], 'output': [{'name': 'INPUT_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'REQUEST_INPUT_LEN', 'data_type': 'TYPE_INT32', 'dims': [1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'BAD_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'STOP_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'EMBEDDING_BIAS', 'data_type': 'TYPE_FP32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'REQUEST_OUTPUT_LEN', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'OUT_END_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'OUT_PAD_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'instance_group': [{'name': 'preprocessing_0', 'kind': 'KIND_CPU', 'count': 1, 'gpus': [], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': 'model.py', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {'tokenizer_dir': {'string_value': 'mistralai/Mixtral-8x7B-Instruct-v0.1'}, 'add_special_tokens': {'string_value': 'false'}}, 'model_warmup': []}
Tokenizer directory: mistralai/Mixtral-8x7B-Instruct-v0.1
Add special tokens: False
Tokenizer: LlamaTokenizerFast(name_or_path='mistralai/Mixtral-8x7B-Instruct-v0.1', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
       0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
       1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
       2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
Pad token: </s>
Tokenizer end ID: 2
Tokenizer pad ID: 2
Input config for EMBEDDING_BIAS_WORDS: {'name': 'EMBEDDING_BIAS_WORDS', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}
Input config for EMBEDDING_BIAS_WEIGHTS: {'name': 'EMBEDDING_BIAS_WEIGHTS', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}
Output config for INPUT_ID: {'name': 'INPUT_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}
Output config for DECODER_INPUT_ID: None
Output config for REQUEST_INPUT_LEN: {'name': 'REQUEST_INPUT_LEN', 'data_type': 'TYPE_INT32', 'dims': [1], 'label_filename': '', 'is_shape_tensor': False}
Output config for REQUEST_DECODER_INPUT_LEN: None
Output config for BAD_WORDS_IDS: {'name': 'BAD_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}
Output config for STOP_WORDS_IDS: {'name': 'STOP_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}
Output config for OUT_END_ID: {'name': 'OUT_END_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}
Output config for OUT_PAD_ID: {'name': 'OUT_PAD_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}

Expected behavior

Loads the PreProcessing model and runs

actual behavior

Fails to load the preprocessing

additional notes

As you can see in the debug it seems its not accepting the pad_token to be the eos.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions