You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @riyajatar37003, this is an interesting observation, I suppose the model files behind config-1 and config-2 are exactly the same? Can you share the commands that launched the perf_analyzer for config 1 and 2, and the output? Could it be because the input data, batch size and/or concurrency range are different between the runs on config 1 and 2?
i am passing these config.pbtxts
confgi-1
`name: "model_onnx"
backend: "onnxruntime"
max_batch_size: 128
input [
{
name: "input_ids"
data_type: TYPE_INT64
dims: [-1 ]
},
{
name: "attention_mask"
data_type: TYPE_INT64
dims: [ -1 ]
}
]
output [
{
name: "logits"
data_type: TYPE_FP32
dims: [1]
}
]
instance_group [
{
count: 4
kind: KIND_GPU
}
]`
config-2
name: "bge_reranker_v2_onnx"
backend: "onnxruntime"
max_batch_size: 0
input [
{
name: "input_ids"
data_type: TYPE_INT64
dims: [-1,-1 ]
},
{
name: "attention_mask"
data_type: TYPE_INT64
dims: [ -1 ,-1]
}
]
output [
{
name: "logits"
data_type: TYPE_FP32
dims: [-1,1]
}
]
instance_group [
{
count: 4
kind: KIND_GPU
}
]
when i ran model-analyser with these config i found that config-2 gave 25% more throughput .
when we specify the max-batch-size=0 , it mean model is not batchable.
can any one explain exact what happening in these configs.?
The text was updated successfully, but these errors were encountered: