-
Notifications
You must be signed in to change notification settings - Fork 132
Open
Labels
triagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
In vLLM, the parameter of "N" is about how many Inference results we can get.
For example, when the "N" is 2, we can get the results as flowing:
"Choices": [
{
"FinishReason": "stop",
"Index": 0,
"Logprobs": {
"TextOffset": [],
"TokenLogprobs": [],
"Tokens": []
},
"Text": "(arr, left, right):"
},
{
"FinishReason": "stop",
"Index": 1,
"Logprobs": {
"TextOffset": [],
"TokenLogprobs": [],
"Tokens": []
},
"Text": "(arr, low, high):"
}
]
In Triton, how to set parameters to achieve the same effect as above?
Thanks.
Metadata
Metadata
Assignees
Labels
triagedIssue has been triaged by maintainersIssue has been triaged by maintainers