[Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference. #5154

LIUKAI0815 · 2024-05-31T10:18:12Z

Your current environment

vllm=0.4.2

🐛 Describe the bug

Same input and output, openai is slow

openai:  python -m vllm.entrypoints.openai.api_server  

fastapi:
from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="facebook/opt-125m")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-05-31T13:19:10Z

To better understand the issue, please show how you timed the code. You might have included the startup time of the offline inference LLM which would make it an unfair comparison.

LIUKAI0815 · 2024-06-27T01:24:03Z

I made the wrong comparison

LIUKAI0815 added the bug Something isn't working label May 31, 2024

DarkLight1337 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference. #5154

[Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference. #5154

LIUKAI0815 commented May 31, 2024 •

edited

Loading

DarkLight1337 commented May 31, 2024 •

edited

Loading

LIUKAI0815 commented Jun 27, 2024

[Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference. #5154

[Bug]: The openai deployment model takes twice as long to deploy as fastapi's approach to offline inference. #5154

Comments

LIUKAI0815 commented May 31, 2024 • edited Loading

Your current environment

🐛 Describe the bug

DarkLight1337 commented May 31, 2024 • edited Loading

LIUKAI0815 commented Jun 27, 2024

LIUKAI0815 commented May 31, 2024 •

edited

Loading

DarkLight1337 commented May 31, 2024 •

edited

Loading