# Level 0: Getting Started with Llama Stack

In this tutorial, we will outline the necessary steps to set up your environment and prepare everything you will need in order to execute our sample notebooks as well as create your own Llama Stack client applications. In particular, we will cover installing and importing the necessary libraries, setting up the essential configuration parameters, and initializing the connection to the Llama Stack server. More advanced sections of this notebook address the setup for RAG and MCP applications.

## Prerequisites

Before starting, ensure you have a running instance of the Llama Stack server (local or remote). You will need at least one preconfigured vector DB to run the RAG notebooks. For detailed llama-stack server setup instructions and for more information, please refer to our [Remote Setup Guide](../../../kubernetes/README.md) and [Local Setup Guide](../../../local_setup_guide.md), as well as to the official [Llama Stack tutorials](https://llama-stack.readthedocs.io/en/latest/getting_started/index.html).

## Installing Dependencies

This code requires `llama-stack` and the `llama-stack-client`, both at version `0.2.2`. Lets begin by installing them:

In [1]:
!pip install llama-stack-client==0.2.2 llama-stack==0.2.2

Collecting llama-stack==0.2.2
  Using cached llama_stack-0.2.2-py3-none-any.whl.metadata (18 kB)
Using cached llama_stack-0.2.2-py3-none-any.whl (3.7 MB)
Installing collected packages: llama-stack
  Attempting uninstall: llama-stack
    Found existing installation: llama_stack 0.2.1
    Uninstalling llama_stack-0.2.1:
      Successfully uninstalled llama_stack-0.2.1
Successfully installed llama-stack-0.2.2


## Setting the Environment Variables

Use the [`.env.example`](../../../.env.example) to create a new file called `.env` and ensure you add all the relevant environment variables below.


### Environment variables required for all demos
- `REMOTE_BASE_URL`: the URL of the remote Llama Stack server.
- `INFERENCE_MODEL_ID`: the ID of the model to use for inference. The model must be available on your server.
- `LOCAL_SERVER_PORT` (optional): the port of the locally running Llama Stack server. Defaults to 8321.
- `REMOTE` (optional): defines whether a locally running or a remote instance of the Llama Stack server should be used. Only the values of 'True' and 'False' are valid. This is useful for switching between a local development environment and a remote deployment (e.g., a Kubernetes cluster). Defaults to True.
- `TEMPERATURE` (optional): the temperature to use during inference. Defaults to 0.0.
- `TOP_P` (optional): the top_p parameter to use during inference. Defaults to 0.95.
- `MAX_TOKENS` (optional): the maximum number of tokens that can be generated in the completion. Defaults to 4096.
- `STREAM` (optional): set this to True to stream the output of the model/agent and False otherwise. Defaults to True.

## Necessary Imports

In [1]:
# for accessing the environment variables
import os
from dotenv import load_dotenv
load_dotenv()

# for communication with Llama Stack
from llama_stack_client import LlamaStackClient

# pretty print of the results returned from the model/agent
import sys
sys.path.append('..')  
from src.utils import step_printer
from termcolor import cprint

## Setting Up the Server Connection

Establish the connection to the Llama Stack server by initializing the wrapper client object according to the predefined settings.

In [2]:
remote = os.getenv("REMOTE", "True")

if remote == "False":
    local_port = os.getenv("LOCAL_SERVER_PORT", 8321)
    base_url = f"http://localhost:{local_port}"
else: # any value non equal to 'False' will be considered as 'True'
    base_url = os.getenv("REMOTE_BASE_URL")


# Tavily search API key is required for some of our demos and must be provided to the client upon initialization.
# We will cover it in the agentic demos that use the respective tool. Please ignore this parameter for all other demos.
tavily_search_api_key = os.getenv("TAVILY_SEARCH_API_KEY")
if tavily_search_api_key is None:
    provider_data = None
else:
    provider_data = {"tavily_search_api_key": tavily_search_api_key}


client = LlamaStackClient(
    base_url=base_url,
    provider_data=provider_data
)
    
print(f"Connected to Llama Stack server @ {base_url}")

Connected to Llama Stack server @ http://localhost:8321


## Initializing the Inference Parameters

Fetch the inference-related parameters from the corresponding environment variables and convert them to the format Llama Stack expects.

In [3]:
# model_id will later be used to pass the name of the desired inference model to Llama Stack Agents/Inference APIs
model_id = os.getenv("INFERENCE_MODEL_ID")

temperature = float(os.getenv("TEMPERATURE", 0.0))
if temperature > 0.0:
    top_p = float(os.getenv("TOP_P", 0.95))
    strategy = {"type": "top_p", "temperature": temperature, "top_p": top_p}
else:
    strategy = {"type": "greedy"}

max_tokens = int(os.getenv("MAX_TOKENS", 4096))

# sampling_params will later be used to pass the parameters to Llama Stack Agents/Inference APIs
sampling_params = {
    "strategy": strategy,
    "max_tokens": max_tokens,
}

stream_env = os.getenv("STREAM", "True")
# the Boolean 'stream' parameter will later be passed to Llama Stack Agents/Inference APIs
# any value non equal to 'False' will be considered as 'True'
stream = (stream_env != "False")

print(f"Inference Parameters:\n\tModel: {model_id}\n\tSampling Parameters: {sampling_params}\n\tstream: {stream}")

Inference Parameters:
	Model: ibm-granite/granite-3.2-8b-instruct
	Sampling Parameters: {'strategy': {'type': 'greedy'}, 'max_tokens': 4096}
	stream: True
