# Introduction

This example notebook demonstrates using `llamastack` with `gofannon`. It is _heavily_ based/cribbed off the examples found [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/getting_started.ipynb) and [here](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/agent_with_tools.py).

Like other framework examples, we'll substitute the `gofannon` Google Search tool. The documentation here may seem sparse on the `llamastack` implementation,
that is on purpose- we direct the reader to the original examples for more information.

## Install Llamastack

In [1]:
import os
import subprocess
import time

!pip install uv

if "UV_SYSTEM_PYTHON" in os.environ:
  del os.environ["UV_SYSTEM_PYTHON"]

# this command installs all the dependencies needed for the llama stack server with the together inference provider
!uv run --with llama-stack llama stack build --template together --image-type venv

def run_llama_stack_server_background():
    log_file = open("llama_stack_server.log", "w")
    process = subprocess.Popen(
        "uv run --with llama-stack llama stack run together --image-type venv",
        shell=True,
        stdout=log_file,
        stderr=log_file,
        text=True
    )

    print(f"Starting Llama Stack server with PID: {process.pid}")
    return process

def wait_for_server_to_start():
    import requests
    from requests.exceptions import ConnectionError
    import time

    url = "http://0.0.0.0:8321/v1/health"
    max_retries = 30
    retry_interval = 1

    print("Waiting for server to start", end="")
    for _ in range(max_retries):
        try:
            response = requests.get(url)
            if response.status_code == 200:
                print("\nServer is ready!")
                return True
        except ConnectionError:
            print(".", end="", flush=True)
            time.sleep(retry_interval)

    print("\nServer failed to start after", max_retries * retry_interval, "seconds")
    return False


# use this helper if needed to kill the server
def kill_llama_stack_server():
    # Kill any existing llama stack server processes
    os.system("ps aux | grep -v grep | grep llama_stack.distribution.server.server | awk '{print $2}' | xargs kill -9")


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[2mhttpx     [0m [32m---------------------------[2m---[0m[0m 62.89 KiB/71.79 KiB
[2mblobfile  [0m [32m------------------------------[2m[0m[0m 73.65 KiB/73.65 KiB
[2mtqdm      [0m [32m------------------------------[2m[0m[0m 76.70 KiB/76.70 KiB
[2mmarkdown-it-py[0m [32m-----------[2m-------------------[0m[0m 30.93 KiB/85.48 KiB
[2mjsonschema[0m [32m-----------------[2m-------------[0m[0m 46.88 KiB/86.39 KiB
[2manyio     [0m [32m-----------------[2m-------------[0m[0m 54.63 KiB/98.55 KiB
[2murllib3   [0m [32m----------------------[2m--------[0m[0m 92.08 KiB/125.66 KiB
[2mjinja2    [0m [32m------------------------------[2m[0m[0m 131.74 KiB/131.74 KiB
[2mcertifi   [0m [32m------------------------------[2m[0m[0m 162.49 KiB/162.49 KiB
[2mfsspec    [0m [32m-------[2m-----------------------[0m[0m 43.56 KiB/189.88 KiB
[2K[23A   [36m[1mBuilding[0m[39m fire[2m==0.7.0

## Start The Server

In [2]:
server_process = run_llama_stack_server_background()
assert wait_for_server_to_start()

Starting Llama Stack server with PID: 1193
Waiting for server to start...............
Server is ready!


## Install and Test the LLamastack Client

This uses [together.ai](https://www.together.ai/). You can sign up for their service and get $1 worth of free credits (at the time of writing). People have sugguested we use ollama instead, but then you need to have a T4 environtment for this colab (instead of the CPU that is currently is), which also requires you to sign up for a service, and then has the additional drawback of not being a testable notebook (github actions don't provide GPUs).

Sorry. If you're really upset, the outputs are left in place so you can see them without running it yourself.

In [3]:
!pip install llama-stack-client --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/273.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m266.2/273.3 kB[0m [31m14.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m273.3/273.3 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
from llama_stack_client import LlamaStackClient
from google.colab import userdata

client = LlamaStackClient(
    base_url="http://0.0.0.0:8321",
    provider_data = {
        "together_api_key": userdata.get('together_api_key')
    }
)

## List Available Models

This ensures we are connected.

In [5]:
from rich.pretty import pprint

print("Available models:")
for m in client.models.list():
    print(f"- {m.identifier}")

print("----")
print("Available shields (safety models):")
for s in client.shields.list():
    print(s.identifier)
print("----")

Available models:
- meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
- meta-llama/Llama-3.1-8B-Instruct
- meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
- meta-llama/Llama-3.1-70B-Instruct
- meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
- meta-llama/Llama-3.1-405B-Instruct-FP8
- meta-llama/Llama-3.2-3B-Instruct-Turbo
- meta-llama/Llama-3.2-3B-Instruct
- meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo
- meta-llama/Llama-3.2-11B-Vision-Instruct
- meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo
- meta-llama/Llama-3.2-90B-Vision-Instruct
- meta-llama/Llama-3.3-70B-Instruct-Turbo
- meta-llama/Llama-3.3-70B-Instruct
- meta-llama/Meta-Llama-Guard-3-8B
- meta-llama/Llama-Guard-3-8B
- meta-llama/Llama-Guard-3-11B-Vision-Turbo
- meta-llama/Llama-Guard-3-11B-Vision
- togethercomputer/m2-bert-80M-8k-retrieval
- togethercomputer/m2-bert-80M-32k-retrieval
- meta-llama/Llama-4-Scout-17B-16E-Instruct
- together/meta-llama/Llama-4-Scout-17B-16E-Instruct
- meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
- meta

## Run a test inference

This small inference on a small model ensures that we are in fact talking to the llama stack.

In [6]:
model_id = "meta-llama/Llama-3.2-3B-Instruct-Turbo"

response = client.inference.chat_completion(
    model_id=model_id,
    messages=[
        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": "Write a two-sentence poem about llama."},
    ],
)

print(response.completion_message.content)

Here is a two-sentence poem about a llama:

With soft fur and gentle eyes,
The llama roams, a gentle surprise.


# `gofannon` + llamastack


## Install

Note that we simply install `gofannon`, we don't need any extras.

In [1]:
!pip install gofannon --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.6/50.6 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/410.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.5/410.5 kB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/856.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m849.9/856.7 kB[0m [31m39.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m856.7/856.7 kB[0m [31m16.0 MB/s[0m eta [36m0:00:00[0m
[?25h

## Create the Tool

Here we will create and configure the Google Search Tool and then export it to a llamastack complient tool.

In [9]:
from gofannon.google_search.google_search import GoogleSearch

google_search = GoogleSearch(api_key=userdata.get("google_search"), engine_id="75be790deec0c42f3")
google_search_for_llama_stack = google_search.export_to_llamastack()

## Create Agent with our Tool

We will create our agent with our Google Search Tool.

Note, Llamastack (and LLama LLMs) have a default search tool, and the example we're copying off of also has another search tool. But for demonstration purposes, the Google Search Tool is very convienient.

After we create our agent, we ask who the 42nd president of the United States was, and we see the tool is successfully called, and the agent appropriately responds, "Bill Clinton"

In [12]:
from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger
from termcolor import colored

agent = Agent(
        client,
        model="meta-llama/Llama-3.2-3B-Instruct-Turbo",
        instructions="You are a helpful assistant. Use the tools you have access to for providing relevant answers.",
        sampling_params={
            "strategy": {"type": "top_p", "temperature": 1.0, "top_p": 0.9},
        },
        tools=[
            google_search_for_llama_stack,
        ],
    )
session_id = agent.create_session("test-session")

user_prompts = [
        "Who was the 42nd president of the United States?",
    ]

for prompt in user_prompts:
    print(colored(f"User> {prompt}", "cyan"))
    response = agent.create_turn(
        messages=[{"role": "user", "content": prompt}],
        session_id=session_id,
    )

    for log in AgentEventLogger().log(response):
        log.print()


User> Who was the 42nd president of the United States?
inference> [google_search(query="42nd president of the United States")]
tool_execution> Tool:google_search Args:{'query': '42nd president of the United States'}
tool_execution> Tool:google_search Response:"Title: Bill Clinton - Wikipedia\nSnippet: William Jefferson Clinton (né Blythe; born August 19, 1946) is an American politician and lawyer who was the 42nd president of the United States from 1993 ...\nLink: https://en.wikipedia.org/wiki/Bill_Clinton\n\nTitle: William J. Clinton | The White House\nSnippet: Bill Clinton is an American politician from Arkansas who served as the 42nd President of the United States (1993-2001).\nLink: https://bidenwhitehouse.archives.gov/about-the-white-house/presidents/william-j-clinton/\n\nTitle: William J. Clinton | whitehouse.gov\nSnippet: Bill Clinton is an American politician from Arkansas who served as the 42nd President of the United States (1993-2001).\nLink: https://obamawhitehouse.archives