-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Models is always loaded in vram #32
Labels
enhancement
New feature or request
Comments
Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid. |
This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions. |
This is my home assistant voice, so I kind of need it available at all times, but I don't mind waiting a few seconds for the model to load on first wake up after being inactive.
-------- Original Message --------On 27/02/2025 17:22, mg-dev25 wrote:
@ecker00 What I do for my infra is to stop the container to free the VRAM using a service like sablier can help you do it automatically.—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
mg-dev25 left a comment (linuxserver/docker-faster-whisper#32)
@ecker00 What I do for my infra is to stop the container to free the VRAM using a service like sablier can help you do it automatically.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
[
{
***@***.***": "http://schema.org",
***@***.***": "EmailMessage",
"potentialAction": {
***@***.***": "ViewAction",
"target": "#32 (comment)",
"url": "#32 (comment)",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
***@***.***": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is this a new feature request?
Wanted change
Save GPU VRAM when not in use. VRAM is quite valuable resource and should be possible to configure a
keep_alive
value. For example with Ollama it is configured like this:keep_alive=-1
keeps model in memory indefinitelykeep_alive=0
unloads model after each usekeep_alive=60
keeps the model in memory for 1 minute after useThis can be a environment variable, default to
-1
to not be a breaking change for anyone.Reason for change
Right now the model is loaded into memory as soon as the container starts, and stays there even when not in use.
Proposed code change
No response
The text was updated successfully, but these errors were encountered: