Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add doc to state our model support policy #3948

Merged
merged 10 commits into from
Apr 10, 2024

Conversation

youkaichao
Copy link
Member

Part of #3780 .

@youkaichao
Copy link
Member Author

youkaichao commented Apr 9, 2024

Two TODO:

  1. How do we list the test status of models? Adding one more column would make the large table very crowded. And it might insult some contributors when they see their models are not tested :(
  2. It seems we only have Strict consistency test for facebook/opt-125m and meta-llama/Llama-2-7b-hf, under tests/basic_correctness/test_basic_correctness.py. Output sensibility test is on the way [CI/Build] A perplexity-computing test for the FP8 KV cache system. Originally used in the context of PR #3290 #3730 . Runtime functionality test works for some models, but I don't have a complete list yet. Currently I do a grep search in tests and examples to collect all the model names.

@youkaichao youkaichao requested a review from simon-mo April 9, 2024 20:12

We have the following levels of testing for models:

1. **Strict consistency**: We compare the output of the model with the output of the model in the HuggingFace Transformers library under greedy decoding. This is the most stringent test. The following models fall under this category:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: This list could be out of date very easily. Should we just link to a test file instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also an option. But essentially it does not tell users what models are tested. Or we can tell users to grep our tests & examples folder to see if a model is tested? 🤣

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second @rkooo567 that a static list on the doc is probably not ideal - maybe we can refer to the CI?

Copy link
Collaborator

@rkooo567 rkooo567 Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One simple solution I can think of is to centralize this constant to tested_model.py (new file) and link to this file instead?


Copy link
Collaborator

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments! Thank you very much for putting this down in the doc!

docs/source/models/supported_models.rst Outdated Show resolved Hide resolved
docs/source/models/supported_models.rst Outdated Show resolved Hide resolved
docs/source/models/supported_models.rst Outdated Show resolved Hide resolved
docs/source/models/supported_models.rst Outdated Show resolved Hide resolved
docs/source/models/supported_models.rst Outdated Show resolved Hide resolved
- EleutherAI/pythia-70m
- bigcode/tiny_starcoder_py
- gpt2
4. **Community feedback**: We rely on the community to provide feedback on the models. If a model is broken or not working as expected, we encourage users to raise issues to report it or open pull requests to fix it. The rest of the models fall under this category.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming here doesn't feel like a level but rather a general guideline - do you mean if certain models are not working at all (due to layer changes, kernel changes, etc), then we rely on the community to fix these models?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean if certain models are not working at all (due to layer changes, kernel changes, etc), then we rely on the community to fix these models

Currently I would say yes. Because we don't test them, it is possible they are broken. But we will do our best to maintain them. e.g. when we make some change in vllm core, we typically update the model files. It's best-effort anyway.


At vLLM, we are committed to facilitating the integration and support of third-party models within our ecosystem. Our approach is designed to balance the need for robustness and the practical limitations of supporting a wide range of models. Here’s how we manage third-party model support:

1. **Community-Driven Support**: We encourage community contributions for adding new models. When a user requests support for a new model, we welcome pull requests (PRs) from the community. These contributions are evaluated primarily on the sensibility of the output they generate, rather than strict consistency with existing implementations such as those in transformers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also worth pointing out that a basic sensibility test is also required in the model support PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sensibility report is required in the PR. However, with respect to adding it into test, there is a strong concern on the time and resource burden it adds to our CI system.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally make sense - I think an attached report from the author in the PR is fine for now and we don't need to build it into our CI.

docs/source/models/supported_models.rst Outdated Show resolved Hide resolved

We have the following levels of testing for models:

1. **Strict consistency**: We compare the output of the model with the output of the model in the HuggingFace Transformers library under greedy decoding. This is the most stringent test. The following models fall under this category:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second @rkooo567 that a static list on the doc is probably not ideal - maybe we can refer to the CI?

@youkaichao
Copy link
Member Author

@ywang96 @rkooo567 please check ff1ae0d where I leave urls instead of static list of model supported.

@youkaichao
Copy link
Member Author

@ywang96 thanks for the detailed review!


We have the following levels of testing for models:

1. **Strict Consistency**: We compare the output of the model with the output of the model in the HuggingFace Transformers library under greedy decoding. This is the most stringent test. Please refer to https://github.com/vllm-project/vllm/tree/main/tests/basic_correctness folder for the models that have passed this test.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; but I feel like linking this file https://github.com/vllm-project/vllm/blob/main/tests/models/test_models.py (small), https://github.com/vllm-project/vllm/blob/main/tests/models/test_big_models.py (big) is better because it has more strict consistency check (basically it checks tokens up to 96 whereas basic correctness test only checks the first 5).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in d59e482

We have the following levels of testing for models:

1. **Strict Consistency**: We compare the output of the model with the output of the model in the HuggingFace Transformers library under greedy decoding. This is the most stringent test. Please refer to https://github.com/vllm-project/vllm/tree/main/tests/basic_correctness folder for the models that have passed this test.
2. **Output Sensibility**: We check if the output of the model is sensible and coherent, by measuring the perplexity of the output and checking for any obvious errors. This is a less stringent test.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also consider adding a link?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any test for Output Sensibility yet.

@rkooo567
Copy link
Collaborator

Looks pretty good to me! just some nits

@youkaichao youkaichao requested a review from ywang96 April 10, 2024 15:51
Copy link
Collaborator

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left some final nits to add names to the links (otherwise they will all show as vllm-project/vllm on the doc.)

docs/source/models/supported_models.rst Outdated Show resolved Hide resolved
docs/source/models/supported_models.rst Outdated Show resolved Hide resolved
youkaichao and others added 2 commits April 10, 2024 09:24
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
@ywang96 ywang96 enabled auto-merge (squash) April 10, 2024 16:29
@ywang96 ywang96 merged commit e353974 into vllm-project:main Apr 10, 2024
35 checks passed
@youkaichao youkaichao deleted the doc_model_policy branch April 10, 2024 17:06
SageMoore pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 11, 2024
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
andy-neuma pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 12, 2024
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
mawong-amd pushed a commit to ROCm/vllm that referenced this pull request Jun 3, 2024
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants