Skip to content

baichuan2-13b exec error #62

@zhanglv0209

Description

@zhanglv0209

root@GPU-26:/tensorrtllm_backend/tensorrtllm_backend/tensorrtllm_backend# CUDA_VISIBLE_DEVICES=0,3 python3 ./scripts/launch_triton_server.py --world_size=2 --model_repo=/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo
root@GPU-26:/tensorrtllm_backend/tensorrtllm_backend/tensorrtllm_backend# I1030 09:08:30.450984 1398 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f3edc000000' with size 268435456
I1030 09:08:30.454704 1398 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I1030 09:08:30.454713 1398 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
I1030 09:08:30.472889 1399 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7ff556000000' with size 268435456
I1030 09:08:30.492939 1399 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I1030 09:08:30.492958 1399 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
I1030 09:08:30.796900 1398 model_lifecycle.cc:461] loading: tensorrt_llm:2
I1030 09:08:30.796944 1398 model_lifecycle.cc:461] loading: preprocessing:1
I1030 09:08:30.796963 1398 model_lifecycle.cc:461] loading: postprocessing:1
I1030 09:08:30.815125 1399 model_lifecycle.cc:461] loading: tensorrt_llm:2
I1030 09:08:30.815165 1399 model_lifecycle.cc:461] loading: preprocessing:1
I1030 09:08:30.815184 1399 model_lifecycle.cc:461] loading: postprocessing:1
[TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value
[TensorRT-LLM][WARNING] Cannot find parameter with name: batch_scheduler_policy
[TensorRT-LLM][WARNING] kv_cache_free_gpu_mem_fraction is not specified, will use default value of 0.85 or max_tokens_in_paged_kv_cache
[TensorRT-LLM][WARNING] max_num_sequences is not specified, will be set to the TRT engine max_batch_size
[TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be number, but is null
[TensorRT-LLM][WARNING] Optional value for parameter max_num_tokens will not be set.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
I1030 09:08:30.880197 1398 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
I1030 09:08:30.880584 1398 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
[TensorRT-LLM][WARNING] max_tokens_in_paged_kv_cache is not specified, will use default value
[TensorRT-LLM][WARNING] Cannot find parameter with name: batch_scheduler_policy
[TensorRT-LLM][WARNING] kv_cache_free_gpu_mem_fraction is not specified, will use default value of 0.85 or max_tokens_in_paged_kv_cache
[TensorRT-LLM][WARNING] max_num_sequences is not specified, will be set to the TRT engine max_batch_size
[TensorRT-LLM][WARNING] enable_trt_overlap is not specified, will be set to true
[TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be number, but is null
[TensorRT-LLM][WARNING] Optional value for parameter max_num_tokens will not be set.
[TensorRT-LLM][INFO] Initializing MPI with thread mode 1
I1030 09:08:30.897152 1399 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)
I1030 09:08:30.897646 1399 python_be.cc:2199] TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)
[TensorRT-LLM][INFO] MPI size: 2, rank: 0
[TensorRT-LLM][INFO] MPI size: 2, rank: 1
I1030 09:08:31.500529 1399 pb_stub.cc:325] Failed to initialize Python stub: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/postprocessing/1/model.py(65): initialize

I1030 09:08:31.503903 1398 pb_stub.cc:325] Failed to initialize Python stub: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/postprocessing/1/model.py(65): initialize

E1030 09:08:31.669733 1398 backend_model.cc:634] ERROR: Failed to create instance: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/postprocessing/1/model.py(65): initialize

E1030 09:08:31.669950 1398 model_lifecycle.cc:621] failed to load 'postprocessing' version 1: Internal: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/postprocessing/1/model.py(65): initialize

I1030 09:08:31.669989 1398 model_lifecycle.cc:756] failed to load 'postprocessing'
E1030 09:08:31.686577 1399 backend_model.cc:634] ERROR: Failed to create instance: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/postprocessing/1/model.py(65): initialize

E1030 09:08:31.686780 1399 model_lifecycle.cc:621] failed to load 'postprocessing' version 1: Internal: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/postprocessing/1/model.py(65): initialize

I1030 09:08:31.686815 1399 model_lifecycle.cc:756] failed to load 'postprocessing'
I1030 09:08:32.939433 1398 pb_stub.cc:325] Failed to initialize Python stub: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/preprocessing/1/model.py(69): initialize

I1030 09:08:32.948539 1399 pb_stub.cc:325] Failed to initialize Python stub: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/preprocessing/1/model.py(69): initialize

E1030 09:08:33.484284 1399 backend_model.cc:634] ERROR: Failed to create instance: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/preprocessing/1/model.py(69): initialize

E1030 09:08:33.484479 1399 model_lifecycle.cc:621] failed to load 'preprocessing' version 1: Internal: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/preprocessing/1/model.py(69): initialize

I1030 09:08:33.484529 1399 model_lifecycle.cc:756] failed to load 'preprocessing'
E1030 09:08:33.497454 1398 backend_model.cc:634] ERROR: Failed to create instance: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/preprocessing/1/model.py(69): initialize

E1030 09:08:33.497550 1398 model_lifecycle.cc:621] failed to load 'preprocessing' version 1: Internal: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/preprocessing/1/model.py(69): initialize

I1030 09:08:33.497571 1398 model_lifecycle.cc:756] failed to load 'preprocessing'
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 2
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 1
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 1
[TensorRT-LLM][INFO] Loaded engine size: 7653 MiB
[TensorRT-LLM][INFO] TRTGptModel maxNumSequences: 2
[TensorRT-LLM][INFO] TRTGptModel maxBatchSize: 1
[TensorRT-LLM][INFO] TRTGptModel enableTrtOverlap: 1
[TensorRT-LLM][INFO] Loaded engine size: 7653 MiB
[TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 8683, GPU 26856 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 8685, GPU 26866 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 8683, GPU 56434 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 8685, GPU 56444 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +7649, now: CPU 0, GPU 7649 (MiB)
[TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +7649, now: CPU 0, GPU 7649 (MiB)
E1030 09:08:44.006947 1398 backend_model.cc:634] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.
E1030 09:08:44.007090 1398 model_lifecycle.cc:621] failed to load 'tensorrt_llm' version 2: Internal: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.
I1030 09:08:44.007124 1398 model_lifecycle.cc:756] failed to load 'tensorrt_llm'
E1030 09:08:44.007384 1398 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'postprocessing' which has no loaded version. Model 'postprocessing' loading failed with error: version 1 is at UNAVAILABLE state: Internal: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/postprocessing/1/model.py(65): initialize
;
I1030 09:08:44.007619 1398 server.cc:592]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1030 09:08:44.007837 1398 server.cc:619]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+
| tensorrtllm | /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-comput |
| | | e-capability":"6.000000","default-max-batch-size":"4"}} |
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-comput |
| | | e-capability":"6.000000","shm-region-prefix-name":"prefix0_","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+

I1030 09:08:44.008106 1398 server.cc:662]
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| postprocessing | 1 | UNAVAILABLE: Internal: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported. |
| | | |
| | | At: |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained |
| | | /tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/postprocessing/1/model.py(65): initialize |
| preprocessing | 1 | UNAVAILABLE: Internal: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported. |
| | | |
| | | At: |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained |
| | | /tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/preprocessing/1/model.py(69): initialize |
| tensorrt_llm | 2 | UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and pa |
| | | ged KV cache. |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+

E1030 09:08:44.013076 1399 backend_model.cc:634] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.
E1030 09:08:44.013151 1399 model_lifecycle.cc:621] failed to load 'tensorrt_llm' version 2: Internal: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and paged KV cache.
I1030 09:08:44.013168 1399 model_lifecycle.cc:756] failed to load 'tensorrt_llm'
E1030 09:08:44.013322 1399 model_repository_manager.cc:563] Invalid argument: ensemble 'ensemble' depends on 'postprocessing' which has no loaded version. Model 'postprocessing' loading failed with error: version 1 is at UNAVAILABLE state: Internal: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported.

At:
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained
/tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/postprocessing/1/model.py(65): initialize
;
I1030 09:08:44.013473 1399 server.cc:592]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1030 09:08:44.013625 1399 server.cc:619]
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+
| tensorrtllm | /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-comput |
| | | e-capability":"6.000000","default-max-batch-size":"4"}} |
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-comput |
| | | e-capability":"6.000000","shm-region-prefix-name":"prefix1_","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------+

I1030 09:08:44.013837 1399 server.cc:662]
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| postprocessing | 1 | UNAVAILABLE: Internal: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported. |
| | | |
| | | At: |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained |
| | | /tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/postprocessing/1/model.py(65): initialize |
| preprocessing | 1 | UNAVAILABLE: Internal: ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported. |
| | | |
| | | At: |
| | | /usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py(748): from_pretrained |
| | | /tensorrtllm_backend/tensorrtllm_backend/triton_model_repo/preprocessing/1/model.py(69): initialize |
| tensorrt_llm | 2 | UNAVAILABLE: Internal: unexpected error when creating modelInstanceState: TrtGptModelInflightBatching requires GPT attention plugin with packed input and pa |
| | | ged KV cache. |
+----------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1030 09:08:44.073052 1398 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA A800 80GB PCIe
I1030 09:08:44.073087 1398 metrics.cc:817] Collecting metrics for GPU 1: NVIDIA A800 80GB PCIe
I1030 09:08:44.076025 1398 metrics.cc:710] Collecting CPU metrics
I1030 09:08:44.076206 1398 tritonserver.cc:2458]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.39.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_me |
| | mory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /tensorrtllm_backend/tensorrtllm_backend/triton_model_repo |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+

I1030 09:08:44.076217 1398 server.cc:293] Waiting for in-flight requests to complete.
I1030 09:08:44.076221 1398 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I1030 09:08:44.076228 1398 server.cc:324] All models are stopped, unloading models
I1030 09:08:44.076234 1398 server.cc:331] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
I1030 09:08:44.077887 1399 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA A800 80GB PCIe
I1030 09:08:44.077913 1399 metrics.cc:817] Collecting metrics for GPU 1: NVIDIA A800 80GB PCIe
I1030 09:08:44.078310 1399 metrics.cc:710] Collecting CPU metrics
I1030 09:08:44.078464 1399 tritonserver.cc:2458]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.39.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_me |
| | mory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /tensorrtllm_backend/tensorrtllm_backend/triton_model_repo |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------+

I1030 09:08:44.078473 1399 server.cc:293] Waiting for in-flight requests to complete.
I1030 09:08:44.078477 1399 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I1030 09:08:44.078484 1399 server.cc:324] All models are stopped, unloading models
I1030 09:08:44.078488 1399 server.cc:331] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
error: creating server: Internal - failed to load all models

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[9409,1],0]
Exit code: 1

===========

i use baichuan2-13b

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions