Use of `logits_processors` has become very slow in v0.3.2 #3087

saattrupdan · 2024-02-28T17:19:43Z

I am using vLLM together with outlines for structured generation.

After having upgraded from v0.3.2, generation became very slow, and the RAM usage leads to OOM crashes now.

Here is a minimal example:

from vllm import LLM, SamplingParams
from outlines.serve.vllm import JSONLogitsProcessor
from pydantic import BaseModel, conlist
import datetime as dt

class Output(BaseModel):
    names: conlist(str, max_length=5)
    organizations: conlist(str, max_length=5)
    locations: conlist(str, max_length=5)
    miscellanous: conlist(str, max_length=5)

llm = LLM('mistralai/Mistral-7B-v0.1', max_model_len=10_000, gpu_memory_utilization=0.9)
logits_processor = JSONLogitsProcessor(schema=Output, llm=llm.llm_engine)
logits_processor.fsm.vocabulary = list(logits_processor.fsm.vocabulary)
prompt = """
Locate all the names, organizations, locations and other miscellaneous entities in the following sentence: 
"Charles went and saw Anna at the coffee shop Starbucks, which was based in a small town in Germany called Essen."
"""
sampling_params = SamplingParams(max_tokens=128, temperature=0, logits_processors=[logits_processor])

t0 = dt.datetime.now()
llm.generate([prompt] * 256, sampling_params=sampling_params)
time_elapsed = (dt.datetime.now() - t0).total_seconds()
print(f"Generation took {time_elapsed:,} seconds.")

When I run the above with vllm==0.3.1, the generation takes 58 seconds and use ~6GB memory, but if I upgrade vllm to v0.3.2 (and none of the other packages are changed), then suddenly the generation takes 418 seconds and spend ~18GB memory. Almost all of the time is spent stalling, not generating anything, but slowly using more and more memory, until it finally begins to generate.

I tried installing a forked version of outlines to see if the stalling was due to the internals of the JSONLogitsProcessor, but it is only called after the "stalling process" is done, so it seems like this is a vLLM issue.

The text was updated successfully, but these errors were encountered:

saattrupdan · 2024-02-28T17:38:22Z

This seems to be due to the deepcopy of the SamplingParams on this line in the LLMEngine, which will thus also copy out the logits processors, which take up a considerable amount of memory in my case. This was added 2 weeks ago in this PR, and which is part of the v0.3.2 release.

Tagging relevant people to that PR: @njhill @Yard1

simon-mo · 2024-02-28T19:06:27Z

@Yard1 I think this should be fixed by next release, especially we are shipping #2819

Yard1 · 2024-02-28T19:24:43Z

Hmm, I see, we should probably make it so that the logit processors are exempt from the deepcopy (unless #2819 already fixes that)

njhill · 2024-02-28T19:56:16Z

Ah, yes sorry about this. I can open a PR to do what @Yard1 suggests.

njhill · 2024-02-29T03:49:05Z

@Yard1 @simon-mo @saattrupdan fix is in #3099

saattrupdan mentioned this issue Feb 28, 2024

[BUG] Crash when evaluating NER datasets for some generative models ScandEval/ScandEval#264

Closed

simon-mo added the v0.3.3 label Feb 28, 2024

simon-mo mentioned this issue Feb 28, 2024

[v0.3.3] Release Tracker #3097

Closed

5 tasks

njhill mentioned this issue Feb 29, 2024

[Fix] Don't deep-copy LogitsProcessors when copying SamplingParams #3099

Merged

saattrupdan mentioned this issue Feb 29, 2024

Fix/ner crash ScandEval/ScandEval#270

Merged

simon-mo closed this as completed in #3099 Feb 29, 2024

maximzubkov mentioned this issue Mar 17, 2024

[Bugfix] Fix beam search logits processor #3454

Open

maxdebayser mentioned this issue May 28, 2024

Fix sharing of sampling params in multiple seq groups IBM/vllm#33

Open

maxdebayser mentioned this issue Jun 13, 2024

[Feature]: Support Guided Decoding in LLM entrypoint #3536

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of `logits_processors` has become very slow in v0.3.2 #3087

Use of `logits_processors` has become very slow in v0.3.2 #3087

saattrupdan commented Feb 28, 2024 •

edited

Loading

saattrupdan commented Feb 28, 2024 •

edited

Loading

simon-mo commented Feb 28, 2024

Yard1 commented Feb 28, 2024

njhill commented Feb 28, 2024

njhill commented Feb 29, 2024

Use of logits_processors has become very slow in v0.3.2 #3087

Use of logits_processors has become very slow in v0.3.2 #3087

Comments

saattrupdan commented Feb 28, 2024 • edited Loading

saattrupdan commented Feb 28, 2024 • edited Loading

simon-mo commented Feb 28, 2024

Yard1 commented Feb 28, 2024

njhill commented Feb 28, 2024

njhill commented Feb 29, 2024

Use of `logits_processors` has become very slow in v0.3.2 #3087

Use of `logits_processors` has become very slow in v0.3.2 #3087

saattrupdan commented Feb 28, 2024 •

edited

Loading

saattrupdan commented Feb 28, 2024 •

edited

Loading