Assertion failed: Cannot determine if hopper is specialised without a selected config at runtime (Latest commit)

### System Info

- x86_64
- NVIDIA L4
- Built TensorRT-LLM backend using:
```bash
DOCKER_BUILDKIT=1 docker build -t triton_trt_llm -f dockerfile/Dockerfile.trt_llm_backend .
```
- TensorRT-LLM version: 0.12.0.dev2024070900
with Commit `6053a5dcf49a51174bcb28b598856c421e896e29`
- Engine built:
```
trtllm-build --checkpoint_dir /models/mixtral-8x7b-instruct-fp8-tp4-quant \
             --output_dir /models/mixtral-8x7b-instruct-engine-fp8-tp4 \
             --use_custom_all_reduce disable \
             --use_fp8_context_fmha disable \
             --workers 4
```

### Who can help?

_No response_

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Build Image using latest commit `6053a5dcf49a51174bcb28b598856c421e896e29` on NVIDIA L4 GPU x8 (AWS g6.48xlarge) Using Ubuntu Deep Learning GPU AMI. NVIDIA Deep Learning AMI No support for G6
```
DOCKER_BUILDKIT=1 docker build -t triton_trt_llm -f dockerfile/Dockerfile.trt_llm_backend .
```
Quantize mixtral-8x7b-instruct to fp8 and then build engine:
```
trtllm-build --checkpoint_dir /models/mixtral-8x7b-instruct-fp8-tp4-quant \
             --output_dir /models/mixtral-8x7b-instruct-engine-fp8-tp4 \
             --use_custom_all_reduce disable \
             --use_fp8_context_fmha disable \
             --workers 4
```
Adjust the config files as needed in `decoupled_mode` with tokenizer and `inflight-fused-batching` plus `max_utlilization`
Load model and run.
Then try to run inference.

### Expected behavior

Runs Inference

### actual behavior

I0709 16:10:56.837763 111 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002"
[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 2 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 3 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 4 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 5 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 6 is not available.
[TensorRT-LLM][WARNING] Device 0 peer access Device 7 is not available.
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
  what():  [TensorRT-LLM][ERROR] Assertion failed: Cannot determine if hopper is specialised without a selected config (/app/tensorrt_llm/cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_template.h:427)
1       0x71c34c806661 /app/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensorrt_llm.so.10(+0x80661) [0x71c34c806661]
2       0x71c34d1e0480 /app/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensorrt_llm.so.10(+0xa5a480) [0x71c34d1e0480]
3       0x71c34d1e04e1 /app/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensorrt_llm.so.10(+0xa5a4e1) [0x71c34d1e04e1]
4       0x71c34ca0013f /app/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensorrt_llm.so.10(+0x27a13f) [0x71c34ca0013f]
5       0x71c34ca1417d /app/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensorrt_llm.so.10(+0x28e17d) [0x71c34ca1417d]
6       0x71c34c8da42d tensorrt_llm::plugins::MixtureOfExpertsPlugin::enqueue(nvinfer1::PluginTensorDesc const*, nvinfer1::PluginTensorDesc const*, void const* const*, void* const*, void*, CUstream_st*) + 925
7       0x71c319fcfa8c /usr/local/tensorrt/lib/libnvinfer.so.10(+0x109fa8c) [0x71c319fcfa8c]
8       0x71c319f74657 /usr/local/tensorrt/lib/libnvinfer.so.10(+0x1044657) [0x71c319f74657]
9       0x71c319f760c1 /usr/local/tensorrt/lib/libnvinfer.so.10(+0x10460c1) [0x71c319f760c1]
10      0x71c376f4e9d4 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeContext(int) + 52
11      0x71c376f5c72b tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeStep(std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, int) + 235
12      0x71c376f5c896 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeBatch(tensorrt_llm::batch_manager::ScheduledRequests const&) + 246
13      0x71c376f5d219 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&) + 2409
14      0x71c376f7d9fb tensorrt_llm::executor::Executor::Impl::forwardAsync(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 411
15      0x71c376f8124d tensorrt_llm::executor::Executor::Impl::executionLoop() + 301
16      0x71c517bdb253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x71c517bdb253]
17      0x71c51796aac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x71c51796aac3]
18      0x71c5179fba04 clone + 68
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'

### additional notes

None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assertion failed: Cannot determine if hopper is specialised without a selected config at runtime (Latest commit) #529

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Assertion failed: Cannot determine if hopper is specialised without a selected config at runtime (Latest commit) #529

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions