Skip to content

Feature request: add Baichuan support #111

@chrjxj

Description

@chrjxj

I am trying to bring up model ensemble with Baichuan trt engine.
with modified tokenizer inside the preprocess and postprocess, trtionserver can load full pipline, but processing client request raise a Segmentation fault in the server side. Could you please take a look? thanks.

export DTYPE=bf16
export TP=1
export PP=1
export MAX_BATCH_SIZE=8
export MAX_INPUT_LEN=512

python3 build.py --model_version v2_13b \
    --model_dir $MODEL_DIR \
    --dtype ${DTYPE} \
    --use_gemm_plugin ${DTYPE} \
    --use_gpt_attention_plugin ${DTYPE} \
    --use_inflight_batching \
    --remove_input_padding \
    --enable_context_fmha \
    --paged_kv_cache \
    --max_batch_size ${MAX_BATCH_SIZE} \
    --max_input_len ${MAX_INPUT_LEN} \
    --max_output_len ${MAX_INPUT_LEN} \
    --world_size 1 \
    --output_dir $TARGET_DIR
  • modified the tokenizer inside the preprocess and postprocess ; see model.py

  • adjust package version inside the nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3: pip3 install tokenizers==0.13.3

  • started the /opt/tritonserver/bin/tritonserver, and send a curl request from client side. Server side got following error:

I1105 03:11:25.675011 587 infer_request.cc:117] [request id: <id_unknown>] Setting state from INITIALIZED to PENDING
I1105 03:11:25.675204 587 infer_request.cc:117] [request id: <id_unknown>] Setting state from EXECUTING to RELEASED
I1105 03:11:25.675198 587 infer_request.cc:117] [request id: <id_unknown>] Setting state from PENDING to EXECUTING
I1105 03:11:25.675244 587 python_be.cc:2321] TRITONBACKEND_ModelInstanceExecute: model instance name preprocessing_0_0 released 1 requests
I1105 03:11:25.675329 587 libtensorrtllm.cc:91] ModelInstanceState::getRequestBooleanInputTensor: user did not not provide stop input for the request
I1105 03:11:25.675380 587 libtensorrtllm.cc:91] ModelInstanceState::getRequestBooleanInputTensor: user did not not provide streaming input for the request
I1105 03:11:25.675408 587 infer_request.cc:117] [request id: <id_unknown>] Setting state from EXECUTING to RELEASED
I1105 03:11:25.675432 587 pinned_memory_manager.cc:191] pinned memory deallocation: addr 0x7fd98a0000c0
I1105 03:11:25.675454 587 pinned_memory_manager.cc:191] pinned memory deallocation: addr 0x7fd98a0000f0
I1105 03:11:25.675476 587 pinned_memory_manager.cc:191] pinned memory deallocation: addr 0x7fd98a000090
Signal (11) received.
 0# 0x000055B12500C13D in /opt/tritonserver/bin/tritonserver
 1# 0x00007FDBE33A2520 in /usr/lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007FD96287DBB0 in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 3# 0x00007FD962849E07 in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 4# 0x00007FD962851008 in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 5# 0x00007FD962851722 in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 6# 0x00007FD96283B241 in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 7# 0x00007FD96283C38A in /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
 8# 0x00007FDBE3664253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
 9# 0x00007FDBE33F4AC3 in /usr/lib/x86_64-linux-gnu/libc.so.6
10# clone in /usr/lib/x86_64-linux-gnu/libc.so.6

Segmentation fault (core dumped)
root@smc:/tensorrtllm_backend#
root@smc:/tensorrtllm_backend# I1105 03:11:27.178736 599 pb_stub.cc:1815]  Non-graceful termination detected.
I1105 03:11:27.257417 613 pb_stub.cc:1815]  Non-graceful termination detected.

checked the src code in libtensorrtllm.cc, and the two warming of unset variable should work fine.

Any directions on fixing Segmentation fault issue? thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagedIssue has been triaged by maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions