change the configuration of engineArgs in config.pbtxt #72

activezhao · 2023-11-22T09:44:50Z

No description provided.

Quick_Deploy/vLLM/model_repository/vllm_model/1/model.py

oandreeva-nv · 2023-11-22T17:39:12Z

Quick_Deploy/vLLM/model_repository/vllm_model/config.pbtxt

+# We need to use decoupled transaction policy for saturating
+# vLLM engine for max throughtput.
+# TODO [DLIS:5233]: Allow asynchronous execution to lift this
+# restriction for cases there is exactly a single response to
+# a single request.
+model_transaction_policy {
+  decoupled: True
+}
+
+input [
+  {
+    name: "text_input"
+    data_type: TYPE_STRING
+    dims: [ 1 ]
+  },
+  {
+    name: "stream"
+    data_type: TYPE_BOOL
+    dims: [ 1 ]
+    optional: true
+  },
+  {
+    name: "sampling_parameters"
+    data_type: TYPE_STRING
+    dims: [ 1 ]
+    optional: true
+  }
+]
+
+output [
+  {
+    name: "text_output"
+    data_type: TYPE_STRING
+    dims: [ -1 ]
+  }
+]


We've recently added autocomplete to vllm's model.py in this pr: triton-inference-server/vllm_backend#20
Thus, this entry can be eliminated from config.pbtxt

We've recently added autocomplete to vllm's model.py in this pr: triton-inference-server/vllm_backend#20 Thus, this entry can be eliminated from config.pbtxt

Hi @oandreeva-nv Yes, I have seen the code of auto_complete_config.

But in my opinion, this may not be a better way to handle the parameters, because users will not know what parameters should be prepared through the configuration entry of config.pbtxt unless they read model.py.

This approach will indeed simplify config.pbtxt, but I don't think this is a good interaction for users, especially if they are used to using Triton Server. In the ecosystem of Triton Server, this processing logic may make them feel confused, or at least I feel that way.

IIRC, inputs, outputs and model_transaction_policy do not change between different vLLM models, thus it might be helpful to auto complete these fields. That was the motivation

oandreeva-nv · 2023-11-22T17:48:02Z

Quick_Deploy/vLLM/model_repository/vllm_model/1/model.py

+            if k in model_engine_args:
+                model_engine_args[k] = float(model_engine_args[k])
+
+        int_keys = ["seed", "pipeline_parallel_size", "tensor_parallel_size", "block_size",


I see what you are doing. My only problem is that vLLM repository is under active development and these fields may change in future.

I see what you are doing. My only problem is that vLLM repository is under active development and these fields may change in future.

@oandreeva-nv This is a very good question. In fact, just like the sampling parameter processing logic.

We really cannot elegantly handle such parameters that may change in a dynamic way, but generally speaking, the frequency of changes is not large.

We can correct this problem by compile matching versions of Triton Server + vLLM + model.py.

In fact, 23.10-vllm-python-py3 has been released. The adaptation of Triton Server + vLLM + model.py is closed-loop and can run very well. Even you can change the model.py in /opt/tritonserver/backends/vllm of the vllm_backend container if you need.

oandreeva-nv

I would like to keep major part of this tutorial as is, since it shows how to use Triton's vllm_backend. Thus, please don't change core part of README, you are welcome to add extra section thought.

On the other hand, it seems like this work is more related to Feature request, i.e. allow vllm_backend to set engineArgs from parameters, specified in config.pbtxt and eliminate dependency on model.json . If this is something you would like to be added to our vllm_backend, please open an issue on the main Triton issues page.

oandreeva-nv · 2023-11-22T17:58:06Z

Quick_Deploy/vLLM/model_repository/vllm_model/config.pbtxt

+]
+
+# The configuration of engineArgs
+parameters {


can we do something like:

parameters [ { key: ... value: {...} }, { key: ... value: {...} }, ... ]

can we do something like:

parameters [ { key: ... value: {...} }, { key: ... value: {...} }, ... ]

@oandreeva-nv Yes, of course we can. But in fact, this is actually a relatively common usage. Can we do the same?

For example:
https://github.com/triton-inference-server/tensorrtllm_backend/blob/47b609b670d6bb33a5ff113d98ad8a44d961c5c6/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt#L155

The final decision is up to you, it just makes more sense for me to keep a list in parameters , and not multiple parameters entries with 1 parameter

nv-kmcgill53 · 2023-11-22T18:12:56Z

CLA has been accepted by the Triton Team. This PR can be merged at our discretion.

activezhao · 2023-11-23T01:51:28Z

CLA has been accepted by the Triton Team. This PR can be merged at our discretion.

@nv-kmcgill53 OK, thank u so much.

oandreeva-nv · 2023-11-27T19:09:33Z

Hi @activezhao, thanks for your work! I was thinking about the structure of this tutorial. I would like to keep original version of this tutorial, which utilizes vllm_backend repo under QuickDeploy section. How about we move this tutorial under a new folder Customization with a new README and suggested model.py and config.pbtxt. Then, in the README you can describe what new methods you've added to model.py and config.pbtxt. What do you think about this idea?

activezhao · 2023-11-28T01:26:40Z

Hi @activezhao, thanks for your work! I was thinking about the structure of this tutorial. I would like to keep original version of this tutorial, which utilizes vllm_backend repo under QuickDeploy section. How about we move this tutorial under a new folder Customization with a new README and suggested model.py and config.pbtxt. Then, in the README you can describe what new methods you've added to model.py and config.pbtxt. What do you think about this idea?

Hi @oandreeva-nv Cool, that sounds pretty good, it's a good idea.

And could you please help us build the basic code structure, and then we can fill things in?

activezhao · 2023-12-04T08:56:31Z

Hi @oandreeva-nv I have created a folder named Customization with new README and suggested model.py and config.pbtxt.

Could you please review the code?

Thanks

Quick_Deploy/Customization/vLLM/model_repository/vllm_model/1/model.py

Quick_Deploy/vLLM/README.md

Quick_Deploy/Customization/vLLM/model_repository/vllm_model/config.pbtxt

Quick_Deploy/Customization/vLLM/README.md

oandreeva-nv · 2023-12-06T23:21:55Z

Quick_Deploy/Customization/vLLM/README.md

+
+## Step 1: Prepare your model repository
+
+To use Triton, we need to build a model repository. A sample model repository for deploying `facebook/opt-125m` using vLLM in Triton is


Please, clarify that in this tutorial we are using Python backend + vLLM, so we probably also need to install vLLM in the environment

How about since we are not using vllm backend, we target this tutorial to use general triton container, and not vllm?

Please, clarify that in this tutorial we are using Python backend + vLLM, so we probably also need to install vLLM in the environment

Hi @oandreeva-nv In fact, I have used nvcr.io/nvidia/tritonserver:23.10-vllm-python-py3 with model.py + config.pbtxt which I modified in the prod environment, and now it is running well.

So, l wonder that if I can change the README.md to a best practice with details?

What do you think?

Sure, feel free to do as you see fit. I can take a look at the final README

oandreeva-nv · 2023-12-27T19:52:00Z

Hi @activezhao , pre-commit hook failed unfortunately, could you please fix the formatting?

activezhao · 2023-12-29T15:43:19Z

Hi @activezhao , pre-commit hook failed unfortunately, could you please fix the formatting?

OK, I will fix them.

activezhao · 2024-01-04T18:09:58Z

Hi @oandreeva-nv I have just updated README and adjusted directory structure, could you please help me to review the code?
Thanks

activezhao added 3 commits November 22, 2023 17:36

change the configuration of engineArgs in config.pbtxt

72efdea

change the configuration of engineArgs in config.pbtxt

402720b

update the model dir name

ad533cd

activezhao mentioned this pull request Nov 22, 2023

config the vLLM engineArgs in config.pbtxt #63

Closed

update the model dir name

a583a65

oandreeva-nv reviewed Nov 22, 2023

View reviewed changes

Quick_Deploy/vLLM/model_repository/vllm_model/1/model.py Outdated Show resolved Hide resolved

oandreeva-nv reviewed Nov 22, 2023

View reviewed changes

create Customization for customizing vLLM

c2f1f3f

github-advanced-security bot found potential problems Dec 6, 2023

View reviewed changes

Quick_Deploy/Customization/vLLM/model_repository/vllm_model/1/model.py Outdated Show resolved Hide resolved

oandreeva-nv reviewed Dec 6, 2023

View reviewed changes

Quick_Deploy/vLLM/README.md Show resolved Hide resolved

oandreeva-nv reviewed Dec 6, 2023

View reviewed changes

Quick_Deploy/Customization/vLLM/model_repository/vllm_model/config.pbtxt Outdated Show resolved Hide resolved

oandreeva-nv reviewed Dec 6, 2023

View reviewed changes

Quick_Deploy/Customization/vLLM/README.md Show resolved Hide resolved

oandreeva-nv reviewed Dec 6, 2023

View reviewed changes

activezhao added 2 commits January 5, 2024 02:00

update README and adjust directory structure

9aa49d2

adjust directory structure

3098bd9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change the configuration of engineArgs in config.pbtxt #72

change the configuration of engineArgs in config.pbtxt #72

activezhao commented Nov 22, 2023

oandreeva-nv Nov 22, 2023

activezhao Nov 23, 2023

oandreeva-nv Nov 27, 2023

oandreeva-nv Nov 22, 2023

activezhao Nov 23, 2023

oandreeva-nv left a comment •

edited

oandreeva-nv Nov 22, 2023

activezhao Nov 23, 2023

oandreeva-nv Nov 27, 2023

nv-kmcgill53 commented Nov 22, 2023

activezhao commented Nov 23, 2023

oandreeva-nv commented Nov 27, 2023

activezhao commented Nov 28, 2023 •

edited

activezhao commented Dec 4, 2023

oandreeva-nv Dec 6, 2023

oandreeva-nv Dec 6, 2023

activezhao Dec 14, 2023 •

edited

oandreeva-nv Dec 27, 2023

oandreeva-nv commented Dec 27, 2023

activezhao commented Dec 29, 2023

activezhao commented Jan 4, 2024


		## Step 1: Prepare your model repository

		To use Triton, we need to build a model repository. A sample model repository for deploying `facebook/opt-125m` using vLLM in Triton is

change the configuration of engineArgs in config.pbtxt #72

Are you sure you want to change the base?

change the configuration of engineArgs in config.pbtxt #72

Conversation

activezhao commented Nov 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oandreeva-nv left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nv-kmcgill53 commented Nov 22, 2023

activezhao commented Nov 23, 2023

oandreeva-nv commented Nov 27, 2023

activezhao commented Nov 28, 2023 • edited

activezhao commented Dec 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

activezhao Dec 14, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oandreeva-nv commented Dec 27, 2023

activezhao commented Dec 29, 2023

activezhao commented Jan 4, 2024

oandreeva-nv left a comment •

edited

activezhao commented Nov 28, 2023 •

edited

activezhao Dec 14, 2023 •

edited