-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
Environment
- CPU architecture: x86_64
- GPUs: 4x 80gb H100
- TensorRT-LLM backend v0.10.0
- Docker image: triton_trt_llm (built using docker, from TensorRT-LLM backend option 2)
- Nvidia driver: 535.183.01
- OS: Ubuntu 22.04
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
First cloned the repo and built the TensorRT-LLM Backend image according to option2.
Also, downloaded git cloned Mixtral-8x7B-Instruct-v0.1 to /data.
Executed the image with
docker run -it --net host --shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v /path/to/tensorrtllm_backend:/tensorrtllm_backend -v /data:/data triton_trt_llm bashInstaled TensorRT-LLM inside the image to generate the model engines:
cd tensorrt_llm &&
bash docker/common/install_cmake.sh &&
export PATH=/usr/local/cmake/bin:$PATH &&
python3 ./scripts/build_wheel.py --trt_root="/usr/local/tensorrt" &&
pip3 install ./build/tensorrt_llm*.whlGenerated checkpoints and build engines with tp_size=4
python tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir /data/Mixtral-8x7B-Instruct-v0.1 \
--output_dir /data/checkpoint_mixtral_tp_4 \
--dtype float16 \
--tp_size 4
trtllm-build --checkpoint_dir /data/checkpoint_mixtral_tp_4 \
--output_dir /data/build_mixtral_tp_4 \
--gemm_plugin float16 \
--max_batch_size 10Setup up model repository
cp all_models/inflight_batcher_llm/ triton_model_repo -r
cp /data/build_mixtral_tp_4/* triton_model_repo/1
python3 tools/fill_template.py -i triton_model_repo/preprocessing/config.pbtxt tokenizer_dir:/data/Mixtral-8x7B-Instruct-v0.1,triton_max_batch_size:10,preprocessing_instance_count:1
python3 tools/fill_template.py -i triton_model_repo/postprocessing/config.pbtxt tokenizer_dir:/data/Mixtral-8x7B-Instruct-v0.1,triton_max_batch_size:10,postprocessing_instance_count:1
python3 tools/fill_template.py -i triton_model_repo/tensorrt_llm_bls/config.pbtxt triton_max_batch_size:10,decoupled_mode:True,bls_instance_count:1,accumulate_tokens:False
python3 tools/fill_template.py -i triton_model_repo/ensemble/config.pbtxt triton_max_batch_size:10
python3 tools/fill_template.py -i triton_model_repo/tensorrt_llm/config.pbtxt triton_backend:tensorrtllm,triton_max_batch_size:10,decoupled_mode:False,max_beam_width:1,engine_dir:/data/build_mixtral_tp_4,max_tokens_in_paged_kv_cache:2560,max_attention_window_size:2560,kv_cache_free_gpu_mem_fraction:0.5,exclude_input_in_output:True,enable_kv_cache_reuse:False,batching_strategy:inflight_fused_batching,max_queue_delay_microseconds:0Launched Triton server:
python3 scripts/launch_triton_server.py --world_size=4 --model_repo=/tensorrtllm_backend/triton_model_repoExpected behavior
Expected to see server working with:
I0919 14:52:10.475738 293 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
I0919 14:52:10.475968 293 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
I0919 14:52:10.517138 293 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002actual behavior
Couldn't launch Triton server.
Error:
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
what(): boost::interprocess::lock_exception
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
[h100-2:00149] *** Process received signal ***
[h100-2:00149] Signal: Aborted (6)
[h100-2:00149] Signal code: (-6)
what(): boost::interprocess::lock_exception
[h100-2:00149] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f71a871e520]
[h100-2:00149] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f71a87729fc]
[h100-2:00149] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f71a871e476]
[h100-2:00149] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f71a87047f3]
[h100-2:00149] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7f71a89a7b9e]
[h100-2:00149] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f71a89b320c]
[h100-2:00149] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad1e9)[0x7f71a89b21e9]
[h100-2:00149] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x99)[0x7f71a89b2959]
[h100-2:00149] [ 8] /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(+0x16884)[0x7f71aa7f6884]
[h100-2:00149] [ 9] /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_RaiseException+0x311)[0x7f71aa7f6f41]
[h100-2:00149] [10] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(__cxa_throw+0x3b)[0x7f71a89b34cb]
[h100-2:00149] [11] /opt/tritonserver/backends/python/libtriton_python.so(+0x87bfa)[0x7f719c12cbfa]
[h100-2:00149] [12] /opt/tritonserver/backends/python/libtriton_python.so(+0x7800c)[0x7f719c11d00c]
[h100-2:00149] [13] /opt/tritonserver/backends/python/libtriton_python.so(+0x7ed06)[0x7f719c123d06]
[h100-2:00149] [14] /opt/tritonserver/backends/python/libtriton_python.so(+0x9930a)[0x7f719c13e30a]
[h100-2:00149] [15] /opt/tritonserver/backends/python/libtriton_python.so(+0x853b3)[0x7f719c12a3b3]
[h100-2:00149] [16] /opt/tritonserver/backends/python/libtriton_python.so(+0x3c4c4)[0x7f719c0e14c4]
[h100-2:00149] [17] /opt/tritonserver/backends/python/libtriton_python.so(TRITONBACKEND_ModelInstanceInitialize+0x4ec)[0x7f719c0e1d0c]
[h100-2:00149] [18] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1af096)[0x7f71a9124096]
[h100-2:00149] [19] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1b02d6)[0x7f71a91252d6]
[h100-2:00149] [20] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x1928e5)[0x7f71a91078e5]
[h100-2:00149] [21] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x192f26)[0x7f71a9107f26]
[h100-2:00149] [22] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x19f81d)[0x7f71a911481d]
[h100-2:00149] [23] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x99ee8)[0x7f71a8775ee8]
[h100-2:00149] [24] /opt/tritonserver/bin/../lib/libtritonserver.so(+0x18afee)[0x7f71a90fffee]
[h100-2:00149] [25] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253)[0x7f71a89e1253]
[h100-2:00149] [26] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f71a8770ac3]
[h100-2:00149] [27] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x44)[0x7f71a8801a04]
[h100-2:00149] *** End of error message ***
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
what(): boost::interprocess::lock_exception
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
what(): boost::interprocess::lock_exception
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
what(): boost::interprocess::lock_exception
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
what(): boost::interprocess::lock_exception
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
what(): boost::interprocess::lock_exception
terminate called after throwing an instance of 'boost::interprocess::lock_exception'
what(): boost::interprocess::lock_exception
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node h100-2 exited on signal 6 (Aborted).additional notes
I also tried with other images to build and serve the model but they didn't work neither.
Much appreciate any help.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working