Feature request: add Baichuan support

I am trying to bring up model ensemble with Baichuan trt engine. 
with modified tokenizer inside the preprocess and postprocess, trtionserver can load full pipline, but processing client request raise a Segmentation fault in the server side. Could you please take a look? thanks. 

- Used `Baichuan2-13B-Chat` model from HF. 
- use `TensorRT-LLM v0.5.0` (https://github.com/NVIDIA/TensorRT-LLM/tree/release/0.5.0/examples/baichuan) to convert model to trt

```
export DTYPE=bf16
export TP=1
export PP=1
export MAX_BATCH_SIZE=8
export MAX_INPUT_LEN=512

python3 build.py --model_version v2_13b \
    --model_dir $MODEL_DIR \
    --dtype ${DTYPE} \
    --use_gemm_plugin ${DTYPE} \
    --use_gpt_attention_plugin ${DTYPE} \
    --use_inflight_batching \
    --remove_input_padding \
    --enable_context_fmha \
    --paged_kv_cache \
    --max_batch_size ${MAX_BATCH_SIZE} \
    --max_input_len ${MAX_INPUT_LEN} \
    --max_output_len ${MAX_INPUT_LEN} \
    --world_size 1 \
    --output_dir $TARGET_DIR
```


- modified the  tokenizer inside the preprocess and postprocess ; see [model.py](https://github.com/chrjxj/tensorrtllm_backend/blob/release/0.5.0/all_models/inflight_batcher_llm/preprocessing/1/model.py)
- adjust package version inside the `nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3`: `pip3 install tokenizers==0.13.3 `

- started the `/opt/tritonserver/bin/tritonserver`, and send a curl request from client side. Server side got following error: 

```bash
I1105 03:11:25.675011 587 infer_request.cc:117] [request id: <id_unknown>] Setting state from INITIALIZED to PENDING
I1105 03:11:25.675204 587 infer_request.cc:117] [request id: <id_unknown>] Setting state from EXECUTING to RELEASED
I1105 03:11:25.675198 587 infer_request.cc:117] [request id: <id_unknown>] Setting state from PENDING to EXECUTING
I1105 03:11:25.675244 587 python_be.cc:2321] TRITONBACKEND_ModelInstanceExecute: model instance name preprocessing_0_0 released 1 requests
I1105 03:11:25.675329 587 libtensorrtllm.cc:91] ModelInstanceState::getRequestBooleanInputTensor: user did not not provide stop input for the request
I1105 03:11:25.675380 587 libtensorrtllm.cc:91] ModelInstanceState::getRequestBooleanInputTensor: user did not not provide streaming input for the request
I1105 03:11:25.675408 587 infer_request.cc:117] [request id: <id_unknown>] Setting state from EXECUTING to RELEASED
I1105 03:11:25.675432 587 pinned_memory_manager.cc:191] pinned memory deallocation: addr 0x7fd98a0000c0
I1105 03:11:25.675454 587 pinned_memory_manager.cc:191] pinned memory deallocation: addr 0x7fd98a0000f0
I1105 03:11:25.675476 587 pinned_memory_manager.cc:191] pinned memory deallocation: addr 0x7fd98a000090
Signal (11) received.
 0# 0x000055B12500C13D in /opt/tritonserver/bin/tritonserver
 1# 0x00007FDBE33A2520 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007FD96287DBB0 in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 3# 0x00007FD962849E07 in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 4# 0x00007FD962851008 in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 5# 0x00007FD962851722 in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 6# 0x00007FD96283B241 in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 7# 0x00007FD96283C38A in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 8# 0x00007FDBE3664253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 9# 0x00007FDBE33F4AC3 in /usr/lib/x86_64-linux-gnu/libc.so.6
10# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Segmentation fault (core dumped)
root@smc:/tensorrtllm_backend#
root@smc:/tensorrtllm_backend# I1105 03:11:27.178736 599 pb_stub.cc:1815]  Non-graceful termination detected.
I1105 03:11:27.257417 613 pb_stub.cc:1815]  Non-graceful termination detected.
```

checked the src code in `libtensorrtllm.cc`, and the two warming of unset variable should work fine. 

Any directions on fixing Segmentation fault issue? thanks. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: add Baichuan support #111

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: add Baichuan support #111

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions