## 4. Image Understanding with Llama 3.2

Below is a complete example of using Together's Llama Stack 0.1 server at https://llama-stack.together.ai to ask Llama 3.2 questions about an image.

### 4.1 Setup and helpers

Below we install the Llama Stack client 0.1, download the example image, define two image helpers, and set Llama Stack Together server URL and Llama 3.2 model name.


In [3]:
!curl -O https://raw.githubusercontent.com/meta-llama/llama-models/refs/heads/main/Llama_Repo.jpeg

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  275k  100  275k    0     0   780k      0 --:--:-- --:--:-- --:--:--  780k


In [None]:
# NBVAL_SKIP
from PIL import Image
import matplotlib.pyplot as plt

def display_image(path):
  img = Image.open(path)
  plt.imshow(img)
  plt.axis('off')
  plt.show()

display_image("Llama_Repo.jpeg")

In [None]:
import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        base64_string = base64.b64encode(image_file.read()).decode("utf-8")
        base64_url = f"data:image/png;base64,{base64_string}"
        return base64_url

In [None]:
from llama_stack_client import LlamaStackClient

LLAMA_STACK_API_TOGETHER_URL="https://llama-stack.together.ai"
LLAMA32_11B_INSTRUCT = "meta-llama/Llama-3.2-11B-Vision-Instruct"

### 4.2 Using Llama Stack Chat API

The code below uses the Llama Stack 0.1's chat API to interact with Llama 3.2:

In [None]:
from llama_stack_client.lib.inference.event_logger import EventLogger

async def run_main(image_path: str, prompt):
    client = LlamaStackClient(
        base_url=LLAMA_STACK_API_TOGETHER_URL,
    )

    message = {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": {
                     "url": {
                          "uri": encode_image(image_path)
                     }
                }
            },
            {
                "type": "text",
                "text": prompt,
            }
        ]
    }

    response = client.inference.chat_completion(
        messages=[message],
        model_id=LLAMA32_11B_INSTRUCT,
        stream=False,
    )

    print(response.completion_message.content.lower().strip())

In [None]:
await run_main("Llama_Repo.jpeg",
     "How many different colors are those llamas?\
     What are those colors?")

### 4.3 Using Llama Stack Agent API

The code below uses the Llama Stack 0.1's Agent API to interact with Llama 3.2:

In [None]:
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.types.agent_create_params import AgentConfig

async def run_main(image_path, prompt):
    base64_image = encode_image(image_path)

    client = LlamaStackClient(
        base_url=LLAMA_STACK_API_TOGETHER_URL,
    )

    agent_config = AgentConfig(
        model=LLAMA32_11B_INSTRUCT,
        instructions="You are a helpful assistant",
        enable_session_persistence=False,
        toolgroups=[],
    )

    agent = Agent(client, agent_config)
    session_id = agent.create_session("test-session")

    response = agent.create_turn(
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": {
                         "url": {
                              "uri": encode_image(image_path)
                         }
                    }
                },
                {
                    "type": "text",
                    "text": prompt,
                }
            ]
        }],
        session_id=session_id,
    )

    for log in EventLogger().log(response):
        log.print()

In [None]:
await run_main("Llama_Repo.jpeg",
         "How many different colors are those llamas?\
         What are those colors?")