Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Add additional comments for LLMEngine and AsyncLLMEngine #1011

Merged
merged 13 commits into from
Jan 12, 2024
Merged

[DOC] Add additional comments for LLMEngine and AsyncLLMEngine #1011

merged 13 commits into from
Jan 12, 2024

Conversation

litone01
Copy link
Contributor

@litone01 litone01 commented Sep 11, 2023

For the following functions:

  • LLMEngine: add_request, abort_request, step and _init_cache
  • AsyncLLMEngine: generate and abort

Summary of changes:

  1. Added documentation pages in sphinx for LLMEngine and AsyncLLMEngine. To address questions related to memory usage, the documentation for init_cache from LLMEngine is also added.
  2. To complement the existing comments, added implementation details and examples for the functions where necessary.
  3. Added a diagram to explain how step() performs one decoding iteration.

TODO:

  • Fix the linting error
  • Update the documentation to reflect the changes introduced during the past week.

Discussion:

  1. Is there any suggestion for how the diagram may be improved for the step() function? For example, illustrate in the context of an example prompt? Would like to receive any advice on how to make it more informative for the readers.

Partially addresses the documentation issues in #244.

Could you kindly review this PR? Thanks! @zhuohan123 @WoosukKwon cc @LiuXiaoxuanPKU

@litone01
Copy link
Contributor Author

Close the PR to fix the TODOs at the moment.

@litone01 litone01 closed this Sep 11, 2023
@litone01 litone01 reopened this Sep 11, 2023
@zhuohan123 zhuohan123 added the documentation Improvements or additions to documentation label Sep 12, 2023
@WoosukKwon WoosukKwon self-requested a review October 2, 2023 17:58
@WoosukKwon
Copy link
Collaborator

Hi @litone01, sorry for the very late review and many thanks for the contribution. I've tried the PR in my laptop and got this error:

WARNING: autodoc: failed to import class 'async_llm_engine.AsyncLLMEngine' from module 'vllm.engine'; the following exception was raised:
No module named 'torch'

The generated doc does not contain anything under LLMEngine and AsyncLLMEngine. Could you take a look at this?

@litone01
Copy link
Contributor Author

litone01 commented Oct 3, 2023

Hi @litone01, sorry for the very late review and many thanks for the contribution. I've tried the PR in my laptop and got this error:

WARNING: autodoc: failed to import class 'async_llm_engine.AsyncLLMEngine' from module 'vllm.engine'; the following exception was raised:
No module named 'torch'

The generated doc does not contain anything under LLMEngine and AsyncLLMEngine. Could you take a look at this?

Thanks @WoosukKwon !

Is it because you are not using the correct python environment? This requires the python dependencies that we used for compilation and development to be present in the environment. For example, the torch package. It works fine on my local setup.

Also, let me fix the conflicted code. There seems to be a compilation issue (different from the one woosuk showed) after rebasing with the latest changes. Let me try to resolve that.

@litone01
Copy link
Contributor Author

litone01 commented Oct 3, 2023

Could I seek your advice on the following error during doc compilation that happens after my latest rebase?

WARNING: autodoc: failed to import class 'async_llm_engine.AsyncLLMEngine' from module 'vllm.engine'; the following exception was raised:

cannot import name 'cuda_utils' from partially initialized module 'vllm' (most likely due to a circular import) (/Users/jerry/projects/vllm/vllm/__init__.py)

My guess is two reasons:

  1. there is a circular import. However, a search that suggests cuda_utils is only used in utils.py and we did not add anything into cuda_utils from utils.py. Could it be caused by how it is attached in setup.py?
  2. there is a name conflict with other python modules. changing the module name from vllm.cuda_utils to vllm.cuda_utils2 in setup.py and utils.py does not resolve the issue.
  3. Something wrong with Sphinx autodoc. I need to investigate further.

Thank you!

@simon-mo
Copy link
Collaborator

I think this is because cuda_utils is a pre-built shared object. It needs a GPU device to build. However, a practical fix would be introduce mocks for these modules so sphinx can safely ignore them. Please take a look at how Ray solve this facing same challenges

https://github.com/ray-project/ray/blob/57c7988ff50f3912aa59da6c90fe74d64b35c63c/doc/source/conf.py#L461-L527

@litone01 are you able to continue to work on this? Thank you!

@litone01
Copy link
Contributor Author

litone01 commented Nov 30, 2023

I think this is because cuda_utils is a pre-built shared object. It needs a GPU device to build. However, a practical fix would be introduce mocks for these modules so sphinx can safely ignore them. Please take a look at how Ray solve this facing same challenges

https://github.com/ray-project/ray/blob/57c7988ff50f3912aa59da6c90fe74d64b35c63c/doc/source/conf.py#L461-L527

@litone01 are you able to continue to work on this? Thank you!

Thanks for the suggestion! Sure, I will take a closer look as soon as I have the bandwidth. I may need slightly more time to update the documentation with the latest code again.

Thanks!

@simon-mo simon-mo merged commit 6549aef into vllm-project:main Jan 12, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants