# RAG / Agentic / MCP demos

This notebook contains a series of demos, showcasing workflows from various agentic use-cases with a deployment of llama-stack. The code for these demos can be found in the [opendatahub-io/llama-stack-on-ocp](https://github.com/opendatahub-io/llama-stack-on-ocp) repo. Thank you to members of the ET org for helping to create these!

These demos were constructed for the purpose of showing compatability of Llama-stack, as a strategic element within Red Hats AI vision, with elements of Red Hat's existing productized stack. 

The demos are organized in terms of user complexity, starting with lower hanging fruit and moving on to more complicated workflows with chained operations.

## Level 1: Foundational RAG (Low Difficulty)

The core concept behind all these use cases is **basic retrieval from a defined knowledge base**. It is used to demonstrate that llama-stack has all the necesary primatives for building comprehendsive rag applications witht using an agnetic workflow if it is not required.

The techstack utilized for this demo:
    - VectorDB: Milvus Lite (out of the box)
    - Inferencing: vLLM
    - LLM: **<NOT SURE WHAT WAS USED @ ILYA>**
    - UI: Llama Stack Playground

### Query 1: `Summarize the main arguments presented in the research paper "The Impact of LLMs on Software Development" (Johnson et al., 2023) that we uploaded.`

In [3]:
print("some_python_code_for_l1_q1")

some_python_code_for_l1_q1


### Query 2: `What are the standard troubleshooting steps for error code E-404 on the 'GizmoPlus' device?.`

In [4]:
print("some_python_code_for_l1_q2")

some_python_code_for_l1_q2


## Level 2: Simple Agentic (Medium-Low Difficulty)

The core concept behind all these use cases is **basic single-tool usage**. 

The techstack utilized for this demo:
    - Inferencing: vLLM
    - LLM: **<NOT SURE WHAT WAS USED>**
    - Tools: builtin tools (webseach_
    - UI: Llama Stack Playground

### Query 1: `Search for recent news articles about advancements in quantum computing.`

In [5]:
print("some_python_code_for_l2_q1")

some_python_code_for_l2_q1


### Query 2: `Find coffee shops near my current location that are open now and have Wi-Fi.`

In [6]:
print("some_python_code_for_l2_q2")

some_python_code_for_l2_q2


### Query 3: `What’s latest in OpenShift?`

In [7]:
print("some_python_code_for_l2_q3")

some_python_code_for_l2_q3


## Level 3: Agentic RAG (Medium)

The core concept of the following use cases is that **retrieval directly informs or enables subsequent agent actions**. 

The techstack utilized for this demo:
    - Inferencing: vLLM
    - LLM: **<NOT SURE WHAT WAS USED @ ILYA>**
    - Tools: builtin tools (webseach), and Rag_tool
    - UI: Llama Stack Playground

### Query 1: `Summarize the main arguments presented in the research paper "The Impact of LLMs on Software Development" (Johnson et al., 2023) that we uploaded.`

In [8]:
print("some_python_code_for_l3_q1")

some_python_code_for_l3_q1


### Query 2: `Based on our customer support knowledge base, what are the standard troubleshooting steps for error code E-404 on the GizmoPlus device?`

In [9]:
print("some_python_code_for_l3_q2")

some_python_code_for_l3_q2


### Query 3: `What does our company's HR policy document state about remote work eligibility for employees based in California?`

In [10]:
print("some_python_code_for_l3_q3")

some_python_code_for_l3_q3


## Level 4: Agentic & MCP (Medium Difficulty)

The core concepts behind all these use cases are **sequential tool calls** or **conditional logic** within the context of an agentic workflow.

### Query 1: (Agentic) `Check the status of my OpenShift cluster. If it’s running, create a new pod named test-pod in the dev namespace.`

In [11]:
some_python_code_for_l4_q1()

<class 'NameError'>: name 'some_python_code_for_l4_q1' is not defined

In [12]:
print("some_python_code_for_l4_q1")

some_python_code_for_l4_q1


### Query 2: (Agentic): `Search for the latest Red Hat OpenShift version on the Red Hat website. Summarize the version number and draft a short email to my team.`

In [13]:
# Code bellow written following examples here: https://llama-stack.readthedocs.io/en/latest/building_applications
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client import LlamaStackClient
import argparse
import logging
import json
import os
from dotenv import load_dotenv

load_dotenv()

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
stream_handler = logging.StreamHandler()
stream_handler.setLevel(logging.INFO)
formatter = logging.Formatter('%(message)s')
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)


parser = argparse.ArgumentParser()
parser.add_argument("-r", "--remote", help="Uses the remote_url", action="store_true")
parser.add_argument("-s", "--session-info-on-exit", help="Prints agent session info on exit", action="store_true")
parser.add_argument("-a", "--auto", help="Automatically runs examples, and does not start a chat session", action="store_true")
args = parser.parse_args()

# model="meta-llama/Llama-3.2-3B-Instruct"
model="ibm-granite/granite-3.2-8b-instruct"

# Connect to a llama stack server
if args.remote:
    base_url = os.getenv("REMOTE_BASE_URL")
    mcp_url = os.getenv("REMOTE_MCP_URL")
else:
    base_url="http://localhost:8321"
    mcp_url="http://host.containers.internal:8000/sse"

client = LlamaStackClient(
    base_url=base_url,
    provider_data={
        "tavily_search_api_key": os.getenv("TAVILY_API_KEY")
    })
logger.info(f"Connected to Llama Stack server @ {base_url} \n")

agent = Agent(
    client=client,
    model=model,
    instructions = """You are a helpful assistant. You have access to a number of tools.
    Whenever a tool is called, be sure return the Response in a friendly and helpful tone.
    When you are asked to search the web you must use a tool.
    """ ,
    tools=["builtin::websearch"],
    tool_config={"tool_choice":"auto"},
    sampling_params={"max_tokens":4096}
)


session_id = agent.create_session(session_name="Draft_email_with_latest_OCP_version")
prompt = """Search for the web for the latest Red Hat OpenShift version on the Red Hat website. Summarize the version number and draft an email to convey this information."""
turn_response = agent.create_turn(
    messages=[
        {
            "role":"user",
            "content": prompt
        }
    ],
    session_id=session_id,
    stream=True,
)
for log in EventLogger().log(turn_response):
    log.print()


<class 'ModuleNotFoundError'>: No module named 'llama_stack_client'

### Query 3: (MCP) `Review OpenShift logs for pods pod-123 and pod-456. Categorize each as ‘Normal’ or ‘Error’. If any show ‘Error’, send a Slack message to the ops team. Otherwise, show a simple summary.`

In [14]:
print("some_python_code_for_l4_q3")

some_python_code_for_l4_q3
