gollamas

A per model "reverse proxy" which redirects requests to multiple ollama servers.

This is a lowest effort implementation of a reverse proxy for ollama, it accepts mainly chat and generation requests, depending on the model requested it will transfer the payload to a server which has been specifically assigned to run the given model. Reffer to API for a list of endpoints currently supported.

run locally

go run ./*.go --level=trace --address 0.0.0.0:11434 --proxy=llama3.2-vision=http://server-02:11434 
--proxy=deepseek-r1:14b=http://server-01:11434

run on docker

Official images are available on docker hub and ghcr.io. You can run the latest image from either:

docker hub: docker run -it -e GOLLAMAS_PROXIES="llama3.2-vision=http://server:11434,deepseek-r1:14b=http://server2:11434" slawoc/gollamas:latest
ghcr.io : docker run -it -e GOLLAMAS_PROXIES="llama3.2-vision=http://server:11434,deepseek-r1:14b=http://server2:11434" ghcr.io/slawo/gollamas:latest

Features

There are various scenarios this projects attempts to resolve, here is a list of features currently implemented:

Usecases

Manage models
- Map model aliases to existing model names (some tools only allow a pre-defined set of models)
- Set that by default only the configured models are returned when listing models
- Set a flag to also return models as aliases
- Set option to allow requests to currently running models (ie server has additional model running)
Keep models in memory
- Preload models (ensure model is loaded uppon startup)
- Ping models (maintain model loaded)
- Add config to enforce model keep alive globally "keep_alive": -1 (if it is worth adding functionality for servers without OLLAMA_KEEP_ALIVE=-1)
- Add config to override model keep alive per model/server "keep_alive": -1
Set fixed size context "options": { "num_ctx": 4096 }
- Add config to set a default context size (if missing) in each request "options": { "num_ctx": 4096 }
- Add config to set a default context size (if missing) per model/server "options": { "num_ctx": 4096 }
- Add config to enforce context size in each request "options": { "num_ctx": 4096 }
- Add config to enforce context size per model/server "options": { "num_ctx": 4096 }

API

Not all endpoints are covered, particularly endpoints which deal with customisation and creation of models are not supported until there is a clear usecase for this.

Supported endpoints
- GET /
- GET /api/tags
- GET /api/ps
- GET /api/version
- GET /v1/models
- GET /v1/models/:model
- HEAD /
- HEAD /api/tags
- HEAD /api/version
- POST /api/chat
- POST /api/embed
- POST /api/embeddings
- POST /api/generate
- POST /api/pull
- POST /api/show
- POST /v1/chat/completions
- POST /v1/completions
- POST /v1/embeddings
Not supported
- HEAD /api/blobs/:digest
- DELETE /api/delete
- POST /api/blobs/:digest
- POST /api/copy
- POST /api/create
- POST /api/push

Internals

The server relies on existing ollama models and middlewares to speed up the development of the initial implementation. Only the requests which have a model ( or the deprecated name) field are transfered to the right server.

When possible other endpoints hit all configured servers to either select one answer (ie: the lowest version available), or are combined into oone response (ie: lists of models).

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
mocks		mocks
.editorconfig		.editorconfig
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ginHelpers.go		ginHelpers.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go
reflect.go		reflect.go
reflect_internal_test.go		reflect_internal_test.go
router.go		router.go
routerOptions.go		routerOptions.go
router_test.go		router_test.go
service.go		service.go
service_test.go		service_test.go
version.go		version.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gollamas

run locally

run on docker

Features

Usecases

API

Internals

About

Releases 9

Packages

Contributors 3

Languages

License

slawo/gollamas

Folders and files

Latest commit

History

Repository files navigation

gollamas

run locally

run on docker

Features

Usecases

API

Internals

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 3

Languages

Packages