A simple document-based RAG application that uses Small Language Models (SLM) like Microsoft Phi-3, Falcon 7b or Mitral 7b to answer questions from the content of documents.
This RAG application uses the new Kubernetes AI toolchain operator (Kaito), a Kubernetes operator provided as a managed add-on for Azure Kubernetes Service (AKS) that simplifies the experience of running Open-Source Software (OSS) AI models on your Azure Kubernetes Service (AKS) clusters.
Kaito follows the classic Kubernetes Custom Resource Definition (CRD)/controller design pattern. The user manages a workspace custom resource that describes the GPU requirements and the inference specification. Kaito controllers automate the deployment by reconciling the workspace custom resource.
Under the hood, KAito uses Karpenter to automatically provision the necessary GPU nodes based on a specification provided in the Workspace custom resource definition (CRD) and sets up the inference server as an endpoint for your AI models. This add-on reduces onboarding time and allows you to focus on AI model usage and development rather than infrastructure setup.
The major components of Kaito include:
- Workspace Controller: This controller reconciles the workspace custom resource, creates machine custom resources to trigger node auto-provisioning, and creates the inference workload (deployment or statefulset) based on the model preset configurations.
- Node Provisioner Controller: This controller, named gpu-provisioner in the Kaito Helm chart, interacts with the workspace controller using the machine CRD from Karpenter. It integrates with Azure Kubernetes Service (AKS) APIs to add new GPU nodes to the AKS cluster. Note that the gpu-provisioner is an open-source component maintained in the Kaito repository and can be replaced by other controllers supporting Karpenter-core APIs.
Using Kaito greatly simplifies the workflow of onboarding large AI inference models into Kubernetes, allowing you to focus on AI model usage and development without the hassle of infrastructure setup.
There are some significant benefits of running open source LLMs or SLMs with Kaito. Some advantages include:
- Automated GPU node provisioning and configuration: Kaito will automatically provision and configure GPU nodes for you. This can help reduce the operational burden of managing GPU nodes, configuring them for Kubernetes, and tuning model deployment parameters to fit GPU profiles.
- Reduced cost: Kaito can help you save money by splitting inferencing across lower end GPU nodes which may also be more readily available and cost less than high-end GPU nodes.
- Support for popular open-source LLMs: Kaito offers preset configurations for popular open-source LLMs. This can help you deploy and manage open-source LLMs on AKS and integrate them with your intelligent applications.
- Fine-grained control: You can have full control over data security and privacy, model development and configuration transparency, and the ability to fine-tune the model to fit your specific use case.
- Network and data security: You can ensure these models are ring-fenced within your organization's network and/or ensure the data never leaves the Kubernetes cluster.
The following diagram shows the high-level architecture of the Kaito-RAG solution:
- An active Azure subscription. If you don't have one, create a free Azure account before you begin.
- Visual Studio Code installed on one of the supported platforms along with the HashiCorp Terraform and the C# Development Kit.
- Azure CLI version 2.59.0 or later installed. To install or upgrade, see Install Azure CLI.
aks-preview
Azure CLI extension of version2.0.0b8
or later installed- Terraform v1.9.0 or later.
- The deployment must be started by a user who has sufficient permissions to assign roles, such as a
User Access Administrator
orOwner
. - Your Azure account also needs
Microsoft.Resources/deployments/write
permissions at the subscription level. - During deployment, the script will create an application registrations on Microsoft Entra ID. Please verify that your user account has the necessary privileges.
Before creating an AKS instance with Kaito support, it is important to turn on the AIToolchainOperatorPreview
feature flag on your subscription.
-
Register the
AIToolchainOperatorPreview
feature flag using the az feature register command.az feature register --namespace "Microsoft.ContainerService" --name "AIToolchainOperatorPreview"
It takes a few minutes for the registration to complete. Please be patient!
-
Verify the registration using the az feature show command.
az feature show --namespace "Microsoft.ContainerService" --name "AIToolchainOperatorPreview"
Wait until the status changes from
Registering
toRegistered
→{ "id": "/subscriptions/…/providers/Microsoft.Features/providers/Microsoft.ContainerService/features/AIToolchainOperatorPreview", "name": "Microsoft.ContainerService/AIToolchainOperatorPreview", "properties": { "state": "Registered" }, "type": "Microsoft.Features/providers/features" }
The Kaito-RAG solution provides Terraform scripts to deploy the infrastructure on your Azure subcriotion. Please revire the variables (and parameters) configuration before deployment to ensure that the default values suit your needs and requirements.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct.
For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.