Deploys a kserve-based inference service and runtime for use on RHOAI
Key | Type | Default | Description |
---|---|---|---|
inferenceService.affinity | object | {} |
|
inferenceService.maxReplicas | int | 1 |
|
inferenceService.minReplicas | int | 1 |
|
inferenceService.name | string | "cpu-inference-service" |
|
inferenceService.resources.limits.cpu | string | "8" |
|
inferenceService.resources.limits.memory | string | "16Gi" |
|
inferenceService.resources.requests.cpu | string | "4" |
|
inferenceService.resources.requests.memory | string | "8Gi" |
|
inferenceService.tolerations | object | {} |
|
model.downloader.image | string | "registry.access.redhat.com/ubi10/python-312-minimal:10.0" |
|
model.filename | string | "mistral-7b-instruct-v0.2.Q5_0.gguf" |
|
model.repository | string | "TheBloke/Mistral-7B-Instruct-v0.2-GGUF" |
|
model.storage.mountPath | string | "/models" |
|
servingRuntime.args[0] | string | "--model" |
|
servingRuntime.args[1] | string | "/models/mistral-7b-instruct-v0.2.Q5_0.gguf" |
|
servingRuntime.image | string | "ghcr.io/ggml-org/llama.cpp:server" |
|
servingRuntime.modelFormat | string | "llama.cpp" |
|
servingRuntime.name | string | "cpu-runtime" |
|
servingRuntime.port | int | 8080 |
This chart requires that a values-secret.yaml file exists in your home directory for the pattern which is using this chart.
The file should be named values-secret-<your_pattern_dir>.yaml
and placed in your home directory (NOT in the pattern repository where it would be committed to Git). The naming convention follows the pattern: values-secret-<pattern_directory_name>.yaml
.
For example, if you have a pattern in the directory rag-llm
locally, then this file should be located at ~/values-secret-rag-llm.yaml
and must contain at minimum a Hugging Face token for a user authenticated to use the model specified in the model.repository
value.
secrets:
- name: huggingface
fields:
- name: token
value: hf_xxxxxxxxxxx
Autogenerated from chart metadata using helm-docs v1.14.2