max_batch_effect #7250

riyajatar37003 · 2024-05-21T15:12:03Z

i am passing these config.pbtxts

confgi-1

`name: "model_onnx"
backend: "onnxruntime"
max_batch_size: 128

input [
{
name: "input_ids"
data_type: TYPE_INT64
dims: [-1 ]
},
{
name: "attention_mask"
data_type: TYPE_INT64
dims: [ -1 ]
}
]
output [
{
name: "logits"
data_type: TYPE_FP32
dims: [1]
}
]
instance_group [
{
count: 4
kind: KIND_GPU
}
]`

config-2
name: "bge_reranker_v2_onnx"
backend: "onnxruntime"
max_batch_size: 0

input [
{
name: "input_ids"
data_type: TYPE_INT64
dims: [-1,-1 ]
},
{
name: "attention_mask"
data_type: TYPE_INT64
dims: [ -1 ,-1]
}
]
output [
{
name: "logits"
data_type: TYPE_FP32
dims: [-1,1]
}
]
instance_group [
{
count: 4
kind: KIND_GPU
}
]

when i ran model-analyser with these config i found that config-2 gave 25% more throughput .

when we specify the max-batch-size=0 , it mean model is not batchable.

can any one explain exact what happening in these configs.?

kthui · 2024-05-22T19:00:24Z

Hi @riyajatar37003, this is an interesting observation, I suppose the model files behind config-1 and config-2 are exactly the same? Can you share the commands that launched the perf_analyzer for config 1 and 2, and the output? Could it be because the input data, batch size and/or concurrency range are different between the runs on config 1 and 2?

kthui self-assigned this May 22, 2024

kthui removed their assignment May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_batch_effect #7250

max_batch_effect #7250

riyajatar37003 commented May 21, 2024

kthui commented May 22, 2024 •

edited

Loading

max_batch_effect #7250

max_batch_effect #7250

Comments

riyajatar37003 commented May 21, 2024

kthui commented May 22, 2024 • edited Loading

kthui commented May 22, 2024 •

edited

Loading