why online seving slower than offline serving?? #2019

BangDaeng · 2023-12-11T12:50:58Z

offline serving
online serving(fastapi)

log: INFO 12-11 21:50:36 llm_engine.py:649] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 0.0%
INFO 12-11 21:50:41 async_llm_engine.py:111] Finished request 261ddff3312f44cd8ee1c52a6acd10e6.

Why is the speed 2 seconds slower when displayed as fastapi??
parameters is same, prompt is same

"Open-Orca/Mistral-7B-OpenOrca" this model same issue
and any llama2 model same issue

cuda_version : 12.0
gpu: a100 40g
my library list attached

Lvjinhong · 2023-12-28T13:51:56Z

@irasin Hello, About #2257 (comment), Through my testing, In my latest test, when using AsyncLLMEngine, I observed significant fluctuations in GPT-Util (0-100%), but the throughput was high. Previously, when using LLMEngine with bs=1, the utilization was stable between 80-90%. What are your thoughts on this?

I am running Llama 70b on 8*A800 80G, and in both scenarios, the Memory Usage is approximately at 74.72GB (gpu_memory_utilization=90%). I'm also curious about the reasons behind such high memory consumption.

SardarArslan · 2024-02-02T11:34:25Z

Same issue here, online inference is almost half as fast as offline inference.

iamhappytoo · 2024-04-10T10:26:02Z

Hello @irasin, is there some new thoughts on this issue? I encounter the same thing, the speed is ~0.49 of the offline batch in tps. Much appreciated for any suggestions!

rbgo404 · 2024-04-14T01:17:36Z

Hello @irasin, is there some new thoughts on this issue? I encounter the same thing, the speed is ~0.49 of the offline batch in tps. Much appreciated for any suggestions!

I have observed the same issue

SamComber · 2024-04-15T10:47:37Z

+1 have observed this also, currently just living with it.

SardarArslan · 2024-04-15T16:29:51Z

I think it's slower due to internet latency.

…

On Mon, 15 Apr 2024, 15:48 Sam Comber, ***@***.***> wrote: +1 have observed this also, currently just living with it. — Reply to this email directly, view it on GitHub <#2019 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATYE26B7Q2J3HOMDG6AQZNTY5OV57AVCNFSM6AAAAABAPWXDNWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJWGUZDSOJSGM> . You are receiving this because you commented.Message ID: ***@***.***>

rbgo404 · 2024-04-15T20:09:39Z

I think it's slower due to internet latency.
…
On Mon, 15 Apr 2024, 15:48 Sam Comber, @.> wrote: +1 have observed this also, currently just living with it. — Reply to this email directly, view it on GitHub <#2019 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATYE26B7Q2J3HOMDG6AQZNTY5OV57AVCNFSM6AAAAABAPWXDNWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJWGUZDSOJSGM . You are receiving this because you commented.Message ID: @.>

Have you done any benchmark on this?

xiejibing · 2024-06-13T09:39:29Z

Confused +1

SuperCB · 2024-08-15T06:00:06Z

Confused +1

AlexBlack2202 · 2024-10-21T03:25:10Z

Confused +1

DarkLight1337 · 2024-10-26T16:44:52Z

The performance should be fine, now that the server is run in a separate process.

BangDaeng changed the title ~~why online seving 2times slow than offline serving??~~ why online seving slower than offline serving?? Dec 11, 2023

DarkLight1337 closed this as completed Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why online seving slower than offline serving?? #2019

why online seving slower than offline serving?? #2019

BangDaeng commented Dec 11, 2023 •

edited

Loading

Lvjinhong commented Dec 28, 2023 •

edited

Loading

SardarArslan commented Feb 2, 2024

iamhappytoo commented Apr 10, 2024

rbgo404 commented Apr 14, 2024

SamComber commented Apr 15, 2024

SardarArslan commented Apr 15, 2024 via email

rbgo404 commented Apr 15, 2024

xiejibing commented Jun 13, 2024

SuperCB commented Aug 15, 2024

AlexBlack2202 commented Oct 21, 2024

DarkLight1337 commented Oct 26, 2024

why online seving slower than offline serving?? #2019

why online seving slower than offline serving?? #2019

Comments

BangDaeng commented Dec 11, 2023 • edited Loading

Lvjinhong commented Dec 28, 2023 • edited Loading

SardarArslan commented Feb 2, 2024

iamhappytoo commented Apr 10, 2024

rbgo404 commented Apr 14, 2024

SamComber commented Apr 15, 2024

SardarArslan commented Apr 15, 2024 via email

rbgo404 commented Apr 15, 2024

xiejibing commented Jun 13, 2024

SuperCB commented Aug 15, 2024

AlexBlack2202 commented Oct 21, 2024

DarkLight1337 commented Oct 26, 2024

BangDaeng commented Dec 11, 2023 •

edited

Loading

Lvjinhong commented Dec 28, 2023 •

edited

Loading