This is sort of a gimmick local reverse proxy for use with chat completion APIs. It consumes different apis (currently only OpenAI-likes, OpenRouter and Gemini) and exposes a common OpenAI-like API (with streaming support!).
Look at this SillyTavern Custom OpenAI configuration and the Available Models list:
The URL points to my local firerouter instance. It offers ST two models. GPT 4.1 in its raw form, and GPT 4.1 with temperature 2. How does it do this?
First, the modelsProvider
in the config.yaml
looks like this:
modelProviders:
or:
type: "genericoai"
url: "https://openrouter.ai/api/v1"
keyProvider: "myORKey"
models:
gpt-4.1-raw:
name: "openai/gpt-4.1"
gpt-4.1-with-temp-2:
name: "openai/gpt-4.1"
processor: setTempTo2
As you can see, the names used here are what's exposed on the other end to ST.
In terms of configuration proper, they're both just OpenRouter GPT 4.1, except
the latter has a processor
.
A processor is a thing that alters a request before it's sent.
This is what setTempTo2
looks like in the config.yaml
:
processors:
setTempTo2:
type: "overridesamplers"
temperature: 2
"overridesamplers" does what it sounds like it does. You can run multiple processors randomly or in order, and you won't be stopped from doing something stupid like overriding temp 5 times before settling on a number you like, or lying about what your processors do in their names.
Finally, you configure your keys by defining keyProviders. Like this:
keyProviders:
myORKey:
type: "literal"
key: "sk-or-v1-your-actual-key-here-lol"
You can also just inline your keyProvider and your processors instead of defining them on the top level and then invoking them by their names.
Like this:
modelProviders:
or:
type: "genericoai"
url: "https://openrouter.ai/api/v1"
keyProvider:
type: "literal"
key: "sk-or-v1-your-actual-key-here-lol"
models:
gpt-4.1-raw:
name: "openai/gpt-4.1"
gpt-4.1-with-temp-2:
name: "openai/gpt-4.1"
processor:
- type: "whitespace" # cleans up most whitespace; breaks ASCII and code
- type: "overridesamplers"
temperature: 2
And, as you can also notice above, your model's processor
(or your top level processor)
can be an array of ProcessorConfiguration
! (This is a shorthand syntax for creating
ChainProcessors. Feel free to use the normal form if YAML object arrays scare you).
Consider now:
modelProviders:
or:
type: "genericoai"
url: "https://openrouter.ai/api/v1"
keyProvider:
type: "literal"
key: "sk-or-v1-your-actual-key-here-lol"
models:
gpt-4.1:
name: "openai/gpt-4.1"
qwen-3-32b:
name: "qwen/qwen3-32b"
random:
type: "random"
modelList: # models are weighted equally if you use a modelList
- or/gpt-4.1
- or/qwen-3-32b
random-2:
type: "random"
modelWeights: # or you can assign arbitrary positive weights!
"or/gpt-4.1": 0.4
"or/qwen-3-32b": 0.6
The random
provider type is basically the true reason this project exists: it allows for random
routing between your previously configured models.
So, for example, you can randomly distribute requests between GPT-4.1 and ChatGPT-4o-Latest, to try and increase your response variety, or between something like Claude Opus 4 and Claude Sonnet 4, to lower your average request costs, or even between multiple variations of the same model, with different processor chains!
There is similarly a random
processor you can use to
more easily specifically randomize your processor chains.
Git clone the project normally (like you did ST), install deps with npm i
,
build with npm run build
and run with npm run start
. Server listens by default
on http://127.0.0.1:3000/v1
.
Make sure to copy config.example.yaml
into config.yaml
and fill out your configuration.
Remember to rebuild after git pulling!
There's no auth. There will be no auth. This isn't fit for anything other than strictly local deployments. It will remain like this.
This is the main configuration object for the entire application.
Property | Type | Default | Description |
---|---|---|---|
port |
number |
3000 |
The port number on which the server will listen. |
keyProviders |
Map<string, KeyProviderConfiguration> |
(empty map) |
A map of named key providers. The key is a unique name you choose, and the value is the provider's configuration object. |
modelProviders |
Map<string, ModelProviderConfiguration> |
(Required) | A map of named model providers. The key is a unique name you choose (e.g., "gpt-4-turbo"), and the value is the provider's configuration. |
processors |
Map<string, ProcessorConfiguration> |
(empty map) |
A map of named processor chains. The key is a unique name, and the value is an array of processor configurations. |
streamingInterval |
number |
0 |
Forces every character in the stream to wait for streamingInterval to flush to the client. |
Loads an API key from a system environment variable.
type: "environment"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "environment" . |
envVar |
string |
Yes | The name of the environment variable to read the key from. |
Uses a key that is directly embedded in the configuration file.
type: "literal"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "literal" . |
key |
string |
Yes | The actual API key string. |
Property | Type | Required | Description |
---|---|---|---|
name |
string |
Yes | The model name on the API firerouter is consuming (like OpenRouter). |
processor |
string or ProcessorConfiguration |
No | The name of a processor (defined in the top-level processors map) or a ProcessorConfiguration to apply to requests. |
Property | Type | Required | Description |
---|---|---|---|
keyProvider |
string or KeyProviderConfiguration |
(Required) | A key provider, either named or inline. MUST be present even if the provider requires no keys. |
Use this for any service that exposes an OpenAI-compatible API, such as OpenAI itself, local engines like llamacpp
,
or other compatible services.
type: "genericoai"
Property | Type | Default | Description |
---|---|---|---|
type |
string |
(Required) | Must be "genericoai" . |
url |
string |
https://api.openai.com/v1 |
The API URL up to /v1 |
models |
Map<string, ModelConfiguration> |
(Required) | The models to load under this provider. |
addMistralPrefix |
boolean |
False | Adds the mistral prefix field to your prefill. |
addMoonshotPartial |
boolean |
False | Adds the moonshot partial field to your prefill. |
A dedicated provider for connecting to Google's Gemini models.
type: "gemini"
Property | Type | Default | Description |
---|---|---|---|
type |
string |
(Required) | Must be "gemini" . |
url |
string |
https://generativelanguage.googleapis.com/v1beta/models |
The base URL for the Gemini API. |
models |
Map<string, ModelConfiguration> |
(Required) | The models to load under this provider. |
Use this for any service that exposes an OpenAI-like API
that omits the messages
field in favor of a prompt
string.
Like OpenRouter.
type: "textcomp"
Property | Type | Default | Description |
---|---|---|---|
type |
string |
(Required) | Must be "textcomp" . |
url |
string |
https://api.openai.com/v1 |
The API URL up to /v1 |
models |
Map<string, ModelConfiguration> |
(Required) | The models to load under this provider. |
template |
string |
(Required) | Nunjucks template for turning the OAI message array into a text completion prompt. |
processOutputWhitespace |
boolean |
False | Applies the Whitespace Processor to the prompt after compilation. |
The template has access to the complete request object
Here's a quick example of a TextCompProvider configuration:
modelProviders:
or-text:
type: "textcomp"
url: "https://openrouter.ai/api/v1"
keyProvider: "myORKey"
template: "
{{messages[0].content}}{{! the raw sysprompt left here after squashing }}
[STORY START]
USER: Please start us off with a nice opening to set the tone
and style.
{% for message in messages.slice(1) %}
{% if message.role == 'user' %}
USER: {{message.content}}
{% else %}
NARRATION: {{message.content}}
{% endif %}
{% endfor %}
NARRATION:"
processOutputWhitespace: true
extraStopStrings: [ "USER:" ]
models:
kimi-k2:
name: "moonshotai/kimi-k2"
processor:
- type: "nodanglingsys"
- type: "squash"
Deciding what to keep in your card/preset and what to place in the prompt template is left to you.
Consider using squash and trusting processOutputWhitespace
to unmangle your prompts.
Also, look up https://yaml-multiline.info/ to remember YAML's behavior with multiline strings.
A meta-provider that randomly selects one of its configured models for each request, optionally using weights.
Requires a keyProvider
, even though it won't be used. Just give a fake made up
key to love and cherish.
type: "random"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "random" . |
modelList |
string[] |
No | A list of model provider names to choose from uniformly. Either use this or modelWeights. |
modelWeights |
Map<string, number> |
No | A map where keys are model provider names and values are their selection weights. Higher weights are more likely to be chosen. Overrides modelList if both are present. |
A simple provider for testing and debugging. It responds with a fixed, pre-defined sentence.
Requires a keyProvider
, even though it won't be used. Just give a fake made up
key to love and cherish.
type: "trivial"
Property | Type | Default | Description |
---|---|---|---|
type |
string |
(Required) | Must be "trivial" . |
output |
string |
Yahallo! Some extra padding to make this longer lol. |
The static string to return in every response. |
Ensures that once a non-system message appears in the chat history, all subsequent system messages are converted
to the user
role.
type: "nodanglingsys"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "nodanglingsys" . |
A simple processor that transforms every system message into a user message.
type: "nosys"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "nosys" . |
Overrides or unsets sampler parameters (like temperature, top_p, etc.) for a request. To remove a sampler
that was sent by the client, set its value to "unset"
.
type: "overridesamplers"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "overridesamplers" . |
temperature |
number | "unset" |
No | The temperature value to set or "unset" to remove. |
topP |
number | "unset" |
No | The top_p value to set or "unset" to remove. |
topK |
number | "unset" |
No | The top_k value to set or "unset" to remove. |
topA |
number | "unset" |
No | The top_a value to set or "unset" to remove. |
minP |
number | "unset" |
No | The min_p value to set or "unset" to remove. |
frequencyPenalty |
number | "unset" |
No | The frequency_penalty value to set or "unset" to remove. |
repetitionPenalty |
number | "unset" |
No | The repetition_penalty value to set or "unset" to remove. |
presencePenalty |
number | "unset" |
No | The presence_penalty value to set or "unset" to remove. |
Applies a regular expression find-and-replace operation on the content of every message in the request.
type: "regex"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "regex" . |
pattern |
string |
Yes | The regular expression pattern to search for (do not wrap in /). |
flags |
string |
No | Regex flags (e.g., "g" for global, "i" for case-insensitive). |
replacement |
string |
Yes | The string to replace the matched pattern with. |
A meta-processor that randomly selects one processor from a list to execute.
type: "random"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "random" . |
processorList |
ProcessorConfiguration[] |
No | An array of processor configurations to choose from randomly with equal weights. Either use this or processorWeights . |
processorWeights |
{ weight: number, config: ProcessorConfiguration }[] |
No | An array of objects with weight (number) and config (ProcessorConfiguration) properties for weighted random selection. Overrides processorList if both are present. |
processorWeights
usage example:
type: random
processorWeights:
- weight: 2
config:
type: nosys
- weight: 3
config:
type: nodanglingsys
Converts all messages following the first assistant message to either user or assistant role (based on configuration).
Does not guarantee equivalent behavior to the proper noass extension. But in principle, if a preset has no assistant prompts, and the card has a greeting, the first assistant message should act as a marker for the beginning of the chat history, and then we have regular noass behavior.
type: "noass"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "noass" . |
role |
string |
Yes | The role to convert assistant messages to: "user" or "assistant" . |
Combines consecutive messages of the same role(s) into a single message, joining their content with a specified string.
type: "squash"
Property | Type | Default | Description |
---|---|---|---|
type |
string |
(Required) | Must be "squash" . |
squashString |
string |
"\n\n" |
The string used to join the content of consecutive messages. |
roles |
string[] |
(Required) | Array of roles to squash: "user" , "assistant" , "system" , "developer" . |
Inserts a new message at a specified position in the message array.
type: "insertmessage"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "insertmessage" . |
role |
string |
Yes | The role of the inserted message: "user" , "assistant" , "system" , or "developer" . |
content |
string |
Yes | The content of the message to insert. |
position |
number |
Yes | The position to insert the message at (negative positions work, uses normal splice logic). |
Does common sense whitespace processing on prompts.
Specifically, for every segment involving two or more sequential whitespace characters:
- if it contains two newlines, the segment is converted into just the two newlines
- if it contains a newline, the segment is converted into just the newline
- if it contains no newlines, it becomes a single space
Obviously breaks code formatting and ASCII, but solid for generic RPing.
type: "whitespace"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "whitespace" . |
A meta-processor that runs multiple processors in sequence as a single processor unit.
Intended for usage with the random
processor. If you're not using random
, this will just
clutter your config.yaml
.
type: "chain"
Property | Type | Required | Description |
---|---|---|---|
type |
string |
Yes | Must be "chain" . |
processors |
ProcessorConfiguration[] |
Yes | An array of processor configurations to run in sequence. |