How to get more than one Inference results with one request?

In vLLM, the parameter of "N" is about how many Inference results we can get.

For example, when the "N" is 2, we can get the results as flowing:

```
"Choices": [
      {
        "FinishReason": "stop",
        "Index": 0,
        "Logprobs": {
          "TextOffset": [],
          "TokenLogprobs": [],
          "Tokens": []
        },
        "Text": "(arr, left, right):"
      },
      {
        "FinishReason": "stop",
        "Index": 1,
        "Logprobs": {
          "TextOffset": [],
          "TokenLogprobs": [],
          "Tokens": []
        },
        "Text": "(arr, low, high):"
      }
    ]
```

In Triton, how to set parameters to achieve the same effect as above?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to get more than one Inference results with one request? #74

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to get more than one Inference results with one request? #74

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions