[Core] Refactoring sampler and support prompt logprob for chunked prefill #4309

rkooo567 · 2024-04-24T00:20:24Z

Summary;

Refactoring sampling metadata and sampler. More concretely

Introduce SequenceGroupToSample class instead of having multiple list data structure to combine
Instead of writing index looping logic again and again, we prepare the prefill/decode indices ahead of time and reuse it
Move prepare_sample to SamplingMetadata
Remove all indexing logics that assume requests are entire prefill or decode
Introduce do_sample to SequenceGroupMetadata. If it is set to False, sampling/sample logprob calculation for the corresponding seq_group is skipped.
Improve docstring and confusing variable names
Remove perform_sampling because it is leaky and overlaps with do_sample. I just use is_driver_worker directly for the same purpose.

Fix prompt logprob for chunked prefil

Existing logic has strong assumption where the seq group only contains entire prefill or decode.
Allow to incrementally update prompt logprobs
Allow to skip sampling when it is chunked prefill (since it is not required)

PR Checklist (Click to Expand)

Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

[Bugfix] for bug fixes.
[CI/Build] for build or continuous integration improvements.
[Doc] for documentation fixes and improvements.
[Model] for adding a new model or improving an existing model. Model name should appear in the title.
[Frontend] For changes on the vLLM frontend (e.g., OpenAI API server, LLM class, etc.)
[Kernel] for changes affecting CUDA kernels or other compute kernels.
[Core] for changes in the core vLLM logic (e.g., LLMEngine, AsyncLLMEngine, Scheduler, etc.)
[Hardware][Vendor] for hardware-specific changes. Vendor name should appear in the prefix (e.g., [Hardware][AMD]).
[Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

We adhere to Google Python style guide and Google C++ style guide.
Pass all linter checks. Please use format.sh to format your code.
The code need to be well-documented to ensure future contributors can easily understand the code.
Include sufficient tests to ensure the project to stay correct and robust. This includes both unit tests and integration tests.
Please add documentation to docs/source/ if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.

Notes for Large Changes

Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with rfc-required and might not go through the PR.

What to Expect for the Reviews

The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:

After the PR is submitted, the PR will be assigned to a reviewer. Every reviewer will pick up the PRs based on their expertise and availability.
After the PR is assigned, the reviewer will provide status update every 2-3 days. If the PR is not reviewed within 7 days, please feel free to ping the reviewer or the vLLM team.
After the review, the reviewer will put an action-required label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.
Please respond to all comments within a reasonable time frame. If a comment isn't clear or you disagree with a suggestion, feel free to ask for clarification or discuss the suggestion.

Thank You

Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!

vllm/engine/llm_engine.py

vllm/engine/output_processor/multi_step.py

rkooo567 · 2024-04-25T09:43:37Z

vllm/engine/output_processor/single_step.py

-            seq_group.sampling_params.detokenize and self.detokenizer:
+    def process_prompt_logprob(self, seq_group: SequenceGroup,
+                               outputs: List[SequenceGroupOutput]) -> None:
+        assert len(outputs) == 1, ("Single step should only has 1 output.")


cc @cadedaniel is this assumption correct?

rkooo567 · 2024-04-25T09:45:41Z

vllm/worker/cpu_model_runner.py

-            generators=generators,
-        )
-        return sampling_metadata
+    # def _prepare_sample(


Will remove after confirming tests are passed

rkooo567 · 2024-04-25T09:45:45Z

vllm/worker/neuron_model_runner.py

-            generators=generators,
-        )
-        return sampling_metadata
+    # def _prepare_sample(


Will remove after confirming tests are passed

zhuohan123

Thank you for the refactor SangBin! The code looks much better now. I left some small comments. But in general the code looks pretty good to me.

zhuohan123 · 2024-04-25T20:50:23Z

vllm/model_executor/layers/sampler.py


+        seq_group_idx = categorized_seq_group_idx[sampling_type]


seq_group_idx -> seq_group_ids? Originally I would like to emphasize this is a list of IDs.

reverted to id. I thought it was confusing because each seq_group already has request_id (and it doesn't match). But no strong preference.

vllm/model_executor/layers/sampler.py

vllm/model_executor/sampling_metadata.py

vllm/engine/output_processor/multi_step.py

rkooo567 · 2024-04-26T00:19:34Z

@zhuohan123 thanks for the review! All comments are addressed!

rkooo567 · 2024-04-26T13:54:39Z

yay. Thanks for the review again @zhuohan123 !! time to refactoring model runner...

…fill (vllm-project#4309)

rkooo567 added 30 commits April 12, 2024 01:58

test added

89a160c

ip

e24af16

Merge branch 'main' into chunked-prefill-logprob-fix

f378d71

op

c308f34

ip

354317e

working

0401dd4

fix prompt logprob

38d3da7

fixed

0fcff9f

.

9301cfb

working

d67442c

e2e working

3326277

fixed a bug

3d23c21

working

f244a15

ip

875fd77

Merge branch 'main' into chunked-prefill-skip-sampling

0ee56dc

ip

5a53e76

ip

82be572

.,

7908284

hopefully it works

32e12be

Merge branch 'main' into chunked-prefill-skip-sampling

10c67f4

Merge branch 'main' into chunked-prefill-skip-sampling

85c4b70

.

275d306

Merge branch 'main' into skip-sampling-comment

e8ee28f

.

ec8140e

refactoring

4dccf3e

ip

e442304

working e2e

f7b9587

ip

d200157

working

40859c6

Merge branch 'main' into skip-sampling-comment

5af9320

Merge branch 'main' into skip-sampling-comment

f118dee

rkooo567 assigned zhuohan123 and simon-mo Apr 24, 2024

rkooo567 added 2 commits April 23, 2024 23:17

Merge branch 'main' into skip-sampling-comment

13e0492

sould work now

b2cfb5e

rkooo567 mentioned this pull request Apr 24, 2024

[Core] Refactoring sampler and support prompt logprob for chunked prefill rkooo567/vllm#19

Closed

rkooo567 added 5 commits April 24, 2024 07:43

fully working

7e8cd20

.

a827179

Merge branch 'main' into skip-sampling-comment

99f2561

ip

82db8d0

clean up done

33a6100

rkooo567 commented Apr 25, 2024

View reviewed changes

rkooo567 added 2 commits April 25, 2024 03:27

.

036c62d

remove unused code

ce2170c

This was referenced Apr 25, 2024

[Feature]: Enable chunked prefill for neuron & cpu workers #4364

Open

[RFC] Upstream Chunked Prefill #3130

Closed

done

bffaf61

zhuohan123 approved these changes Apr 25, 2024

View reviewed changes

rkooo567 added 2 commits April 25, 2024 17:13

Merge branch 'main' into skip-sampling-comment

1afa6f1

done

7ecf381

zhuohan123 enabled auto-merge (squash) April 26, 2024 00:27

rkooo567 mentioned this pull request Apr 26, 2024

[Speculative decoding] Support target-model logprobs #4378

Merged

zhuohan123 merged commit 603ad84 into vllm-project:main Apr 26, 2024
48 checks passed

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 6, 2024

[Core] Refactoring sampler and support prompt logprob for chunked pre…

ee654c9

…fill (vllm-project#4309)

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 7, 2024

[Core] Refactoring sampler and support prompt logprob for chunked pre…

ad2f90d

…fill (vllm-project#4309)

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request Jun 3, 2024

[Core] Refactoring sampler and support prompt logprob for chunked pre…

94f44da

…fill (vllm-project#4309)

toslunar mentioned this pull request Jun 3, 2024

[Bug]: prompt_logprobs=0 raises AssertionError #5213

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Refactoring sampler and support prompt logprob for chunked prefill #4309

[Core] Refactoring sampler and support prompt logprob for chunked prefill #4309

rkooo567 commented Apr 24, 2024 •

edited

rkooo567 Apr 25, 2024

cadedaniel Apr 26, 2024

rkooo567 Apr 25, 2024

rkooo567 Apr 25, 2024

zhuohan123 left a comment

zhuohan123 Apr 25, 2024

rkooo567 Apr 26, 2024

rkooo567 commented Apr 26, 2024

rkooo567 commented Apr 26, 2024

[Core] Refactoring sampler and support prompt logprob for chunked prefill #4309

[Core] Refactoring sampler and support prompt logprob for chunked prefill #4309

Conversation

rkooo567 commented Apr 24, 2024 • edited

PR Title and Classification

Code Quality

Notes for Large Changes

What to Expect for the Reviews

Thank You

rkooo567 Apr 25, 2024

Choose a reason for hiding this comment

cadedaniel Apr 26, 2024

Choose a reason for hiding this comment

rkooo567 Apr 25, 2024

Choose a reason for hiding this comment

rkooo567 Apr 25, 2024

Choose a reason for hiding this comment

zhuohan123 left a comment

Choose a reason for hiding this comment

zhuohan123 Apr 25, 2024

Choose a reason for hiding this comment

rkooo567 Apr 26, 2024

Choose a reason for hiding this comment

rkooo567 commented Apr 26, 2024

rkooo567 commented Apr 26, 2024

rkooo567 commented Apr 24, 2024 •

edited