Issue Mixtral 8x7b failed to load preprocessing model. 

### System Info

- x86_64
- NVIDIA L4 and NVIDIA A10 same issue
- Built TensorRT-LLM backend using:
```bash
DOCKER_BUILDKIT=1 docker build -t triton_trt_llm -f dockerfile/Dockerfile.trt_llm_backend .
```
with Commit `4d399bc75426263be9b31b66d42e4db81b73b6f7`
- Mixtral 8x7b Instruct Quant and built Engine:
```bash
 NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.4
 ```
 FP8 engine and loads model but just not the preprocessing.


### Who can help?

_No response_

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Using the HF Mixtral Tokenizer downloaded from the repo itself
Also FP8 Engine that loads. The problem seems to be with just the Mixtral tokenizer and not with the mistral one. I can run Mistral 7b but not 8x7b.
 
 Error:
 ```
 I0708 17:43:37.351682 111 pb_stub.cc:385] "Failed to initialize Python stub: TypeError: 'NoneType' object is not subscriptable\n\nAt:\n  /app/repo-struct/preprocessing/1/model.py(72): initialize\n"
E0708 17:43:37.898915 111 backend_model.cc:692] "ERROR: Failed to create instance: TypeError: 'NoneType' object is not subscriptable\n\nAt:\n  /app/repo-struct/preprocessing/1/model.py(72): initialize\n"
E0708 17:43:37.899027 111 model_lifecycle.cc:641] "failed to load 'preprocessing' version 1: Internal: TypeError: 'NoneType' object is not subscriptable\n\nAt:\n  /app/repo-struct/preprocessing/1/model.py(72): initialize\n"
I0708 17:43:37.899054 111 model_lifecycle.cc:776] "failed to load 'preprocessing'"
 ```
 Info debug:
 ```
 Initializing model with arguments:
{'model_config': '{"name":"preprocessing","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":64,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false},{"name":"REQUEST_OUTPUT_LEN","data_type":"TYPE_INT32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false},{"name":"BAD_WORDS_DICT","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"STOP_WORDS_DICT","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"EMBEDDING_BIAS_WORDS","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"EMBEDDING_BIAS_WEIGHTS","data_type":"TYPE_FP32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"END_ID","data_type":"TYPE_INT32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true},{"name":"PAD_ID","data_type":"TYPE_INT32","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":true}],"output":[{"name":"INPUT_ID","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"REQUEST_INPUT_LEN","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false},{"name":"BAD_WORDS_IDS","data_type":"TYPE_INT32","dims":[2,-1],"label_filename":"","is_shape_tensor":false},{"name":"STOP_WORDS_IDS","data_type":"TYPE_INT32","dims":[2,-1],"label_filename":"","is_shape_tensor":false},{"name":"EMBEDDING_BIAS","data_type":"TYPE_FP32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"REQUEST_OUTPUT_LEN","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"OUT_END_ID","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false},{"name":"OUT_PAD_ID","data_type":"TYPE_INT32","dims":[-1],"label_filename":"","is_shape_tensor":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"instance_group":[{"name":"preprocessing_0","kind":"KIND_CPU","count":1,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"tokenizer_dir":{"string_value":"mistralai/Mixtral-8x7B-Instruct-v0.1"},"add_special_tokens":{"string_value":"false"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': 'preprocessing_0_0', 'model_instance_device_id': '0', 'model_repository': '/app/repo-struct/preprocessing', 'model_version': '1', 'model_name': 'preprocessing'}
Model config loaded:
{'name': 'preprocessing', 'platform': '', 'backend': 'python', 'runtime': '', 'version_policy': {'latest': {'num_versions': 1}}, 'max_batch_size': 64, 'input': [{'name': 'QUERY', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': False}, {'name': 'REQUEST_OUTPUT_LEN', 'data_type': 'TYPE_INT32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': False}, {'name': 'BAD_WORDS_DICT', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'STOP_WORDS_DICT', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'EMBEDDING_BIAS_WORDS', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'EMBEDDING_BIAS_WEIGHTS', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'END_ID', 'data_type': 'TYPE_INT32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}, {'name': 'PAD_ID', 'data_type': 'TYPE_INT32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}], 'output': [{'name': 'INPUT_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'REQUEST_INPUT_LEN', 'data_type': 'TYPE_INT32', 'dims': [1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'BAD_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'STOP_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'EMBEDDING_BIAS', 'data_type': 'TYPE_FP32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'REQUEST_OUTPUT_LEN', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'OUT_END_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}, {'name': 'OUT_PAD_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}], 'batch_input': [], 'batch_output': [], 'optimization': {'priority': 'PRIORITY_DEFAULT', 'input_pinned_memory': {'enable': True}, 'output_pinned_memory': {'enable': True}, 'gather_kernel_buffer_threshold': 0, 'eager_batching': False}, 'instance_group': [{'name': 'preprocessing_0', 'kind': 'KIND_CPU', 'count': 1, 'gpus': [], 'secondary_devices': [], 'profile': [], 'passive': False, 'host_policy': ''}], 'default_model_filename': 'model.py', 'cc_model_filenames': {}, 'metric_tags': {}, 'parameters': {'tokenizer_dir': {'string_value': 'mistralai/Mixtral-8x7B-Instruct-v0.1'}, 'add_special_tokens': {'string_value': 'false'}}, 'model_warmup': []}
Tokenizer directory: mistralai/Mixtral-8x7B-Instruct-v0.1
Add special tokens: False
Tokenizer: LlamaTokenizerFast(name_or_path='mistralai/Mixtral-8x7B-Instruct-v0.1', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
        0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
Pad token: </s>
Tokenizer end ID: 2
Tokenizer pad ID: 2
Input config for EMBEDDING_BIAS_WORDS: {'name': 'EMBEDDING_BIAS_WORDS', 'data_type': 'TYPE_STRING', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}
Input config for EMBEDDING_BIAS_WEIGHTS: {'name': 'EMBEDDING_BIAS_WEIGHTS', 'data_type': 'TYPE_FP32', 'format': 'FORMAT_NONE', 'dims': [-1], 'is_shape_tensor': False, 'allow_ragged_batch': False, 'optional': True}
Output config for INPUT_ID: {'name': 'INPUT_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}
Output config for DECODER_INPUT_ID: None
Output config for REQUEST_INPUT_LEN: {'name': 'REQUEST_INPUT_LEN', 'data_type': 'TYPE_INT32', 'dims': [1], 'label_filename': '', 'is_shape_tensor': False}
Output config for REQUEST_DECODER_INPUT_LEN: None
Output config for BAD_WORDS_IDS: {'name': 'BAD_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}
Output config for STOP_WORDS_IDS: {'name': 'STOP_WORDS_IDS', 'data_type': 'TYPE_INT32', 'dims': [2, -1], 'label_filename': '', 'is_shape_tensor': False}
Output config for OUT_END_ID: {'name': 'OUT_END_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}
Output config for OUT_PAD_ID: {'name': 'OUT_PAD_ID', 'data_type': 'TYPE_INT32', 'dims': [-1], 'label_filename': '', 'is_shape_tensor': False}
```

### Expected behavior

Loads the PreProcessing model and runs

### actual behavior

Fails to load the preprocessing

### additional notes

As you can see in the debug it seems its not accepting the pad_token to be the eos. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue Mixtral 8x7b failed to load preprocessing model. #525

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue Mixtral 8x7b failed to load preprocessing model. #525

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions