Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 17 additions & 20 deletions content/patterns/rag-llm-gitops/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@ date: 2024-07-25
tier: tested
summary: The goal of this demo is to showcase a Chatbot LLM application augmented with data from Red Hat product documentation running on Red Hat OpenShift. It deploys an LLM application that connects to multiple LLM providers such as OpenAI, Hugging Face, and NVIDIA NIM. The application generates a project proposal for a Red Hat product.
rh_products:
- Red Hat OpenShift Container Platform
- Red Hat OpenShift GitOps
- Red Hat OpenShift Container Platform
- Red Hat OpenShift GitOps
- Red Hat OpenShift AI
partners:
- EDB
- Elastic
- EDB
- Elastic
industries:
- General
- General
aliases: /ai/
# uncomment once this exists
# pattern_logo: retail.png
Expand All @@ -26,16 +27,15 @@ ci: ragllm

## Introduction

This deployment is based on the _validated pattern framework_, using GitOps for
This deployment is based on the _Validated Patterns framework_, using GitOps for
seamless provisioning of all operators and applications. It deploys a Chatbot
application that harnesses the power of Large Language Models (LLMs) combined
with the Retrieval-Augmented Generation (RAG) framework.

The pattern uses the [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai) to deploy and serve LLM models at scale.

The application uses either the [EDB Postgres for Kubernetes operator](https://catalog.redhat.com/software/container-stacks/detail/5fb41c88abd2a6f7dbe1b37b)
(default), or Redis, to store embeddings of Red Hat product documentation, running on Red Hat
OpenShift Container Platform to generate project proposals for specific Red Hat products.
The pattern provides several options for the RAG DB vector store including EDB Postgres (the default), Elasticsearch,
Redis, and Microsoft SQL Server.

## Demo Description & Architecture

Expand All @@ -47,7 +47,7 @@ The application generates a project proposal for a Red Hat product.
- Leveraging [Red Hat OpenShift AI](https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai) to deploy and serve LLM models powered by NVIDIA GPU accelerator.
- LLM Application augmented with content from Red Hat product documentation.
- Multiple LLM providers (OpenAI, Hugging Face, NVIDIA).
- Vector Database, such as EDB Postgres for Kubernetes, or Redis, to store embeddings of Red Hat product documentation.
- Vector Database, such as EDB Postgres, Elasticsearch, or Microsoft SQL Server to store embeddings of Red Hat product documentation.
- Monitoring dashboard to provide key metrics such as ratings.
- GitOps setup to deploy e2e demo (frontend / vector database / served models).

Expand All @@ -57,31 +57,29 @@ The application generates a project proposal for a Red Hat product.

_Figure 3. Schematic diagram for workflow of RAG demo with Red Hat OpenShift._


#### RAG Data Ingestion

![ingestion](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-ingress-sd.png)

_Figure 4. Schematic diagram for Ingestion of data for RAG._


#### RAG Augmented Query


![query](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/schematic-diagrams/rag-demo-vp-query-sd.png)
![query](/images/rag-llm-gitops/rag-augmented-query.png)

_Figure 5. Schematic diagram for RAG demo augmented query._

In Figure 5, we can see RAG augmented query. The Mistral-7B model is used for
In Figure 5, we can see RAG augmented query. The `granite-3.3-8b-instruct` model is used for
language processing. LangChain is used to integrate different tools of the LLM-based
application together and to process the PDF files and web pages. A vector
database provider such as EDB Postgres for Kubernetes (or Redis), is used to
store vectors. HuggingFace TGI is used to serve the Mistral-7B model. Gradio is
database provider such as EDB Postgres for Kubernetes (or Elasticsearch), is used to
store vectors. vLLM is used to serve the `granite-3.3-8b-instruct` model. Gradio is
used for user interface and object storage to store language model and other
datasets. Solution components are deployed as microservices in the Red Hat
OpenShift Container Platform cluster.

#### Download diagrams

View and download all of the diagrams above in our open source tooling site.

[Open Diagrams](https://www.redhat.com/architect/portfolio/tool/index.html?#gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/diagrams/rag-demo-vp.drawio)
Expand All @@ -92,14 +90,13 @@ _Figure 6. Proposed demo architecture with OpenShift AI_

### Components deployed

- **Hugging Face Text Generation Inference Server:** The pattern deploys a Hugging Face TGIS server. The server deploys `mistral-community/Mistral-7B-v0.2` model. The server will require a GPU node.
- **vLLM Inference Server:** The pattern deploys a vLLM server. The server deploys `ibm-granite/granite-3.3-8b-instruct` model. The server will require a GPU node.
- **EDB Postgres for Kubernetes / Redis Server:** A Vector Database server is deployed to store vector embeddings created from Red Hat product documentation.
- **Populate VectorDb Job:** The job creates the embeddings and populates the vector database.
- **LLM Application:** This is a Chatbot application that can generate a project proposal by augmenting the LLM with the Red Hat product documentation stored in vector db.
- **Prometheus:** Deploys a prometheus instance to store the various metrics from the LLM application and TGIS server.
- **Prometheus:** Deploys a prometheus instance to store the various metrics from the LLM application and vLLM inference server.
- **Grafana:** Deploys Grafana application to visualize the metrics.


![Overview](https://gitlab.com/osspa/portfolio-architecture-examples/-/raw/main/images/intro-marketectures/rag-demo-vp-marketing-slide.png)

_Figure 1. Overview of the validated pattern for RAG Demo with Red Hat OpenShift_
Expand Down
23 changes: 16 additions & 7 deletions content/patterns/rag-llm-gitops/deploying-different-db.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,41 @@
---
title: Deploying a different database
title: Deploying a different database
weight: 12
aliases: /rag-llm-gitops/deploy-different-db/
---

# Deploying a different database

This pattern supports two types of vector databases, EDB Postgres for Kubernetes, and Redis. By default the pattern will deploy EDB Postgres for Kubernetes as a vector database. To deploy change the global.db.type parameter to the REDIS value in your local branch in `values-global.yaml`.
This pattern supports several types of vector databases, EDB Postgres for Kubernetes, Elasticsearch, Redis, Microsoft SQL Server, and the cloud-deployed Azure SQL Server. By default the pattern will deploy EDB Postgres for Kubernetes as a vector database. To use a different vector database, change the `global.db.type` parameter to `ELASTIC`, `MSSQL` etc. in your local branch in `values-global.yaml`.

```yaml
---
global:
pattern: rag-llm-gitops
options:
useCSV: false
syncPolicy: Automatic
installPlanApproval: Automatic
# Possible value for db.type = [REDIS, EDB]
# Possible values for RAG vector DB db.type:
# REDIS -> Redis (Local chart deploy)
# EDB -> PGVector (Local chart deploy)
# ELASTIC -> Elasticsearch (Local chart deploy)
# MSSQL -> MS SQL Server (Local chart deploy)
# AZURESQL -> Azure SQL (Pre-existing in Azure)
db:
index: docs
type: EDB
# Add for model ID
# Models used by the inference service (should be a HuggingFace model ID)
model:
modelId: mistral-community/Mistral-7B-Instruct-v0.3
vllm: ibm-granite/granite-3.3-8b-instruct
embedding: sentence-transformers/all-mpnet-base-v2

storageClass: gp3-csi

main:
clusterGroupName: hub
multiSourceConfig:
enabled: true
clusterGroupChartVersion: 0.9.*
```


This is also where you are able to update both the LLM model served by the vLLM inference service as well as the embedding model used by the vector database.
47 changes: 29 additions & 18 deletions content/patterns/rag-llm-gitops/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,30 @@ aliases: /rag-llm-gitops/getting-started/
- You have the OpenShift Container Platform installation program and the pull secret for your cluster. You can get these from [Install OpenShift on AWS with installer-provisioned infrastructure](https://console.redhat.com/openshift/install/aws/installer-provisioned).
- Red Hat Openshift cluster running in AWS.

It is also possible to deploy the RAG-LLM Gitops pattern to Azure. Since these docs focus mostly on the AWS deployment, it's recommended that you reference [RAG-LLM pattern on Microsoft Azure](https://validatedpatterns.io/patterns/azure-rag-llm-gitops/)
for more details about installing this pattern on Azure.

## Procedure

1. Create the installation configuration file using the steps described in [Creating the installation configuration file](https://docs.openshift.com/container-platform/latest/installing/installing_aws/ipi/installing-aws-customizations.html#installation-initializing_installing-aws-customizations).

> **Note:**
> Supported regions are `us-east-1` `us-east-2` `us-west-1` `us-west-2` `ca-central-1` `sa-east-1` `eu-west-1` `eu-west-2` `eu-west-3` `eu-central-1` `eu-north-1` `ap-northeast-1` `ap-northeast-2` `ap-northeast-3` `ap-southeast-1` `ap-southeast-2` and `ap-south-1`. For more information about installing on AWS see, [Installation methods](https://docs.openshift.com/container-platform/latest/installing/installing_aws/preparing-to-install-on-aws.html).
>

2. Customize the generated `install-config.yaml` creating one control plane node with instance type `m5.2xlarge` and 3 worker nodes with instance type `m5.2xlarge`. A sample YAML file is shown here:

```yaml
additionalTrustBundlePolicy: Proxyonly
apiVersion: v1
baseDomain: aws.validatedpatterns.io
compute:
- architecture: amd64
hyperthreading: Enabled
name: worker
platform:
aws:
type: m5.2xlarge
replicas: 3
- architecture: amd64
hyperthreading: Enabled
name: worker
platform:
aws:
type: m5.2xlarge
replicas: 3
controlPlane:
architecture: amd64
hyperthreading: Enabled
Expand All @@ -44,18 +47,18 @@ aliases: /rag-llm-gitops/getting-started/
name: kevstestcluster
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
- cidr: 10.128.0.0/14
hostPrefix: 23
machineNetwork:
- cidr: 10.0.0.0/16
- cidr: 10.0.0.0/16
networkType: OVNKubernetes
serviceNetwork:
- 172.30.0.0/16
- 172.30.0.0/16
platform:
aws:
region: us-east-1
publish: External
pullSecret: '<pull-secret>'
pullSecret: "<pull-secret>"
sshKey: |
ssh-ed25519 <public-key> someuser@redhat.com
```
Expand All @@ -67,31 +70,35 @@ aliases: /rag-llm-gitops/getting-started/
```sh
$ git clone git@github.com:your-username/rag-llm-gitops.git
```

5. Go to your repository: Ensure you are in the root directory of your git repository by using the following command:

```sh
$ cd rag-llm-gitops
```

6. Create a local copy of the secret values file by running the following command:

```sh
$ cp values-secret.yaml.template ~/values-secret-rag-llm-gitops.yaml
```

> **Note:**
>For this demo, editing this file is unnecessary as the default configuration works out of the box upon installation.
> For this demo, editing this file is unnecessary as the default configuration works out of the box upon installation.

7. Add the remote upstream repository by running the following command:

```sh
$ git remote add -f upstream git@github.com:validatedpatterns/rag-llm-gitops.git
```

8. Create a local branch by running the following command:

```sh
$ git checkout -b my-test-branch main
```

9. By default the pattern deploys the EDB Postgres for Kubernetes as a vector database. To deploy Redis, change the `global.db.type` parameter to the `REDIS` value in your local branch in `values-global.yaml`. For more information see, [Deploying a different databases](/rag-llm-gitops/deploy-different-db/) to change the vector database.
9. By default the pattern deploys the EDB Postgres for Kubernetes as a vector database. To deploy Elasticsearch, change the `global.db.type` parameter to the `ELASTIC` value in your local branch in `values-global.yaml`. For more information see, [Deploying a different databases](/rag-llm-gitops/deploy-different-db/) to change the vector database.

10. By default instance types for the GPU nodes are `g5.2xlarge`. Follow the [Customize GPU provisioning nodes](/rag-llm-gitops/gpuprovisioning/) to change the GPU instance types.

Expand All @@ -100,6 +107,7 @@ aliases: /rag-llm-gitops/getting-started/
```sh
$ git push origin my-test-branch
```

12. Ensure you have logged in to the cluster at both command line and the console by using the login credentials presented to you when you installed the cluster. For example:

```sh
Expand All @@ -109,18 +117,24 @@ aliases: /rag-llm-gitops/getting-started/
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.demo1.openshift4-beta-abcorp.com
INFO Login to the console with user: kubeadmin, password: <provided>
```

13. Add GPU nodes to your existing cluster deployment by running the following command:

```sh
$ ./pattern.sh make create-gpu-machineset
```

> **Note:**
> You may need to create a file `config` in your home directory and populate it with the region name.
>
> 1. Run the following:
>
> ```sh
> vi ~/.aws/config
> ```
>
> 2. Add the following:
>
> ```sh
> [default]
> region = us-east-1
Expand All @@ -136,7 +150,6 @@ aliases: /rag-llm-gitops/getting-started/

> **Note:**
> This deploys everything you need to run the demo application including the Nividia GPU Operator and the Node Feature Discovery Operator used to determine your GPU nodes.
>

## Verify the Installation

Expand Down Expand Up @@ -167,5 +180,3 @@ aliases: /rag-llm-gitops/getting-started/
- Click the `Generate` button, a project proposal should be generated. The project proposal also contains the reference of the RAG content. The project proposal document can be Downloaded in the form of a PDF document.

![Routes](/images/rag-llm-gitops/proposal.png)


Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.