-
Notifications
You must be signed in to change notification settings - Fork 132
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
- x86_64
- NVIDIA L4 and NVIDIA A10 same issue
- Built TensorRT-LLM backend using:
DOCKER_BUILDKIT=1 docker build -t triton_trt_llm -f dockerfile/Dockerfile.trt_llm_backend .with Commit 4d399bc75426263be9b31b66d42e4db81b73b6f7
- Mixtral 8x7b Instruct Quant and built Engine:
NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.4FP8 engine and loads model but just not the preprocessing.
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Using the HF Mixtral Tokenizer downloaded from the repo itself
Also FP8 Engine that loads. The problem seems to be with just the Mixtral tokenizer and not with the mistral one. I can run Mistral 7b but not 8x7b.
Error:
I0708 17:43:37.351682 111 pb_stub.cc:385] "Failed to initialize Python stub: TypeError: 'NoneType' object is not subscriptable\n\nAt:\n /app/repo-struct/preprocessing/1/model.py(72): initialize\n"
E0708 17:43:37.898915 111 backend_model.cc:692] "ERROR: Failed to create instance: TypeError: 'NoneType' object is not subscriptable\n\nAt:\n /app/repo-struct/preprocessing/1/model.py(72): initialize\n"
E0708 17:43:37.899027 111 model_lifecycle.cc:641] "failed to load 'preprocessing' version 1: Internal: TypeError: 'NoneType' object is not subscriptable\n\nAt:\n /app/repo-struct/preprocessing/1/model.py(72): initialize\n"
I0708 17:43:37.899054 111 model_lifecycle.cc:776] "failed to load 'preprocessing'"
Info debug:
Initializing model with arguments:
{'model_config': '{"name":"preprocessing","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":64,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false},{"name":"REQUEST_OUTPUT_LEN","data_type":"TYPE_INT32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false},{"name":"BAD_WORDS_DICT","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"STOP_WORDS_DICT","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"EMBEDDING_BIAS_WORDS","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"EMBEDDING_BIAS_WEIGHTS","data_type":"TYPE_FP32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"END_ID","data_type":"TYPE_INT32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"PAD_ID","data_type":"TYPE_INT32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true}],"output":[{"name":"INPUT_ID","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"REQUEST_INPUT_LEN","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false},{"name":"BAD_WORDS_IDS","data_type":"TYPE_INT32","dims":[2,-1],"label_filename":"","is_shape_tensor":false},{"name":"STOP_WORDS_IDS","data_type":"TYPE_INT32","dims":[2,-1],"label_filename":"","is_shape_tensor":false},{"name":"EMBEDDING_BIAS","data_type":"TYPE_FP32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"REQUEST_OUTPUT_LEN","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"OUT_END_ID","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"OUT_PAD_ID","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"instance_group":[{"name":"preprocessing_0","kind":"KIND_CPU","count":1,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"tokenizer_dir":{"string_value":"mistralai/Mixtral-8x7B-Instruct-v0.1"},"add_special_tokens":{"string_value":"false"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': 'preprocessing_0_0', 'model_instance_device_id': '0', 'model_repository': '/app/repo-struct/preprocessing', 'model_version': '1', 'model_name': 'preprocessing'}
Model config loaded:
{'name': 'preprocessing', 'platform': '', 'backend': 'python', 'runtime': '', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 64, 'input': [{'name': 'QUERY', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': False}, {'name': 'REQUEST_OUTPUT_LEN', 'data_type': 'TYPE_INT32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': False}, {'name': 'BAD_WORDS_DICT', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'STOP_WORDS_DICT', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'EMBEDDING_BIAS_WORDS', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'EMBEDDING_BIAS_WEIGHTS', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'END_ID', 'data_type': 'TYPE_INT32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'PAD_ID', 'data_type': 'TYPE_INT32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}], 'output': [{'name': 'INPUT_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'REQUEST_INPUT_LEN', 'data_type': 'TYPE_INT32', 'dims': [1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'BAD_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'STOP_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'EMBEDDING_BIAS', 'data_type': 'TYPE_FP32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'REQUEST_OUTPUT_LEN', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'OUT_END_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'OUT_PAD_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'instance_group': [{'name': 'preprocessing_0', 'kind': 'KIND_CPU', 'count': 1, 'gpus': [], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': 'model.py', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {'tokenizer_dir': {'string_value': 'mistralai/Mixtral-8x7B-Instruct-v0.1'}, 'add_special_tokens': {'string_value': 'false'}}, 'model_warmup': []}
Tokenizer directory: mistralai/Mixtral-8x7B-Instruct-v0.1
Add special tokens: False
Tokenizer: LlamaTokenizerFast(name_or_path='mistralai/Mixtral-8x7B-Instruct-v0.1', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>'}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
Pad token: </s>
Tokenizer end ID: 2
Tokenizer pad ID: 2
Input config for EMBEDDING_BIAS_WORDS: {'name': 'EMBEDDING_BIAS_WORDS', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}
Input config for EMBEDDING_BIAS_WEIGHTS: {'name': 'EMBEDDING_BIAS_WEIGHTS', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}
Output config for INPUT_ID: {'name': 'INPUT_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}
Output config for DECODER_INPUT_ID: None
Output config for REQUEST_INPUT_LEN: {'name': 'REQUEST_INPUT_LEN', 'data_type': 'TYPE_INT32', 'dims': [1], 'label_filename': '', 'is_shape_tensor': False}
Output config for REQUEST_DECODER_INPUT_LEN: None
Output config for BAD_WORDS_IDS: {'name': 'BAD_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}
Output config for STOP_WORDS_IDS: {'name': 'STOP_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}
Output config for OUT_END_ID: {'name': 'OUT_END_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}
Output config for OUT_PAD_ID: {'name': 'OUT_PAD_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}
Expected behavior
Loads the PreProcessing model and runs
actual behavior
Fails to load the preprocessing
additional notes
As you can see in the debug it seems its not accepting the pad_token to be the eos.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working