[FEAT] Models is always loaded in vram #32

ecker00 · 2025-01-22T19:41:21Z

Is this a new feature request?

I have searched the existing issues

Wanted change

Save GPU VRAM when not in use. VRAM is quite valuable resource and should be possible to configure a keep_alive value. For example with Ollama it is configured like this:

keep_alive=-1 keeps model in memory indefinitely
keep_alive=0 unloads model after each use
keep_alive=60 keeps the model in memory for 1 minute after use

This can be a environment variable, default to -1 to not be a breaking change for anyone.

Reason for change

Right now the model is loaded into memory as soon as the container starts, and stays there even when not in use.

Proposed code change

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2025-01-22T19:41:44Z

Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.

LinuxServer-CI · 2025-02-22T10:53:42Z

This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.

mg-dev25 · 2025-02-27T16:22:08Z

@ecker00 What I do for my infra is to stop the container to free the VRAM using a service like sablier can help you do it automatically.

ecker00 · 2025-02-27T21:16:14Z

This is my home assistant voice, so I kind of need it available at all times, but I don't mind waiting a few seconds for the model to load on first wake up after being inactive. -------- Original Message --------On 27/02/2025 17:22, mg-dev25 wrote: @ecker00 What I do for my infra is to stop the container to free the VRAM using a service like sablier can help you do it automatically.—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***> mg-dev25 left a comment (linuxserver/docker-faster-whisper#32) @ecker00 What I do for my infra is to stop the container to free the VRAM using a service like sablier can help you do it automatically. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***> [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#32 (comment)", "url": "#32 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

ecker00 added the enhancement label Jan 22, 2025

LinuxServer-CI moved this to Issues in Issue & PR Tracker Jan 22, 2025

LinuxServer-CI added this to Issue & PR Tracker Jan 22, 2025

LinuxServer-CI added the no-issue-activity label Feb 22, 2025

LinuxServer-CI removed the no-issue-activity label Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Models is always loaded in vram #32

[FEAT] Models is always loaded in vram #32

ecker00 commented Jan 22, 2025

github-actions bot commented Jan 22, 2025

LinuxServer-CI commented Feb 22, 2025

mg-dev25 commented Feb 27, 2025

ecker00 commented Feb 27, 2025 via email

[FEAT] Models is always loaded in vram #32

[FEAT] Models is always loaded in vram #32

Comments

ecker00 commented Jan 22, 2025

Is this a new feature request?

Wanted change

Reason for change

Proposed code change

github-actions bot commented Jan 22, 2025

LinuxServer-CI commented Feb 22, 2025

mg-dev25 commented Feb 27, 2025

ecker00 commented Feb 27, 2025 via email