## Setup Dependencies

In [1]:
!pip install groq
!pip install -U llama-stack

Collecting groq
  Downloading groq-0.18.0-py3-none-any.whl.metadata (14 kB)
Downloading groq-0.18.0-py3-none-any.whl (121 kB)
Installing collected packages: groq
Successfully installed groq-0.18.0
Collecting llama-stack
  Downloading llama_stack-0.1.5.1-py3-none-any.whl.metadata (15 kB)
Collecting blobfile (from llama-stack)
  Downloading blobfile-3.0.0-py3-none-any.whl.metadata (15 kB)
Collecting fire (from llama-stack)
  Downloading fire-0.7.0.tar.gz (87 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting huggingface-hub (from llama-stack)
  Downloading huggingface_hub-0.29.1-py3-none-any.whl.metadata (13 kB)
Collecting llama-models>=0.1.5rc3 (from llama-stack)
  Downloading llama_models-0.1.5-py3-none-any.whl.metadata (8.6 kB)
Collecting llama-stack-client>=0.1.5rc3 (from llama-stack)
  Downloading llama_stack_client-0.1.5-py3-none-any.whl.metadata (15 kB)
Collecting termcolor (from llama-stack)
  Downloading termcolor-2.5.0-py3-none-any.whl.metadata (6.1 kB)
Collecting ti

In [2]:
!UV_SYSTEM_PYTHON=1 llama stack build --template together --image-type venv

uv is not installed, trying to install it.
Collecting uv
  Downloading uv-0.6.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading uv-0.6.3-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.2/16.2 MB[0m [31m34.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: uv
Successfully installed uv-0.6.3
Installing dependencies in system Python environment
[2mUsing Python 3.12.7 environment at: /home/vkeval/anaconda3[0m
[2mAudited [1m1 package[0m [2min 1.07s[0m[0m
Installing pip dependencies
[2mUsing Python 3.12.7 environment at: /home/vkeval/anaconda3[0m
[2K[2mResolved [1m119 packages[0m [2min 1.58s[0m[0m                                       [0m
[2K[2mPrepared [1m50 packages[0m [2min 9.35s[0m[0m                                            
[2mUninstalled [1m3 packages[0m [2min 99ms[0m[0m
[2K[2mInstalled [1m50 p

In [2]:
!pip install yt-dlp pytubefix youtube-transcript-api

Collecting yt-dlp
  Downloading yt_dlp-2025.2.19-py3-none-any.whl.metadata (171 kB)
Collecting pytubefix
  Downloading pytubefix-8.12.2-py3-none-any.whl.metadata (5.3 kB)
Collecting youtube-transcript-api
  Downloading youtube_transcript_api-0.6.3-py3-none-any.whl.metadata (17 kB)
Downloading yt_dlp-2025.2.19-py3-none-any.whl (3.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.2/3.2 MB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading pytubefix-8.12.2-py3-none-any.whl (730 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m730.7/730.7 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading youtube_transcript_api-0.6.3-py3-none-any.whl (622 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m622.3/622.3 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: yt-dlp, pytubefix, youtube-transcript-api
Successfully installed pytubefix-8.12.2 youtube-transcript-api-0.6.3 yt-dlp-

## Setup Tools

In [3]:
import yt_dlp
from dataclasses import dataclass
from datetime import datetime
from youtube_transcript_api import YouTubeTranscriptApi

@dataclass
class VideoMetadata:
    title : str
    upload_data : str
    duration_s : int
    url : str

def search_youtube(search_query, num_queries=2):
    ydl_opts = {
        "default_search": f"ytsearch{num_queries}",
        "quiet": True,
    }
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        search_results = ydl.extract_info(search_query, download=False)

    return [VideoMetadata(x['title'], datetime.fromtimestamp(float(x['upload_date'])).strftime('%m/%d/%Y'), x['duration'], x['webpage_url']) for x in search_results['entries']]

def get_transcript(url, fast=True):
    if fast:
        vid_id = pytubefix.YouTube(url).video_id
    else:
        with yt_dlp.YoutubeDL({'quiet':True}) as ydl:
            vid_id = ydl.extract_info(url, download=False)['id']
    return YouTubeTranscriptApi.get_transcript(vid_id)

In [1]:
import os
from groq import Groq

with open('api_key', 'r') as f:
    api_key = f.readline()

client = Groq(
    api_key=api_key,  # This is the default and can be omitted
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Explain the importance of low latency LLMs",
        }
    ],
    model="deepseek-r1-distill-llama-70b",
)
print(chat_completion.choices[0].message.content)

<think>
Okay, so I'm trying to understand why low latency is important for large language models (LLMs). I remember reading that latency refers to the delay before a response is received, so low latency means faster responses. But why is that a big deal for LLMs?

First, I think about where LLMs are used. They're in things like chatbots, virtual assistants, and maybe even in real-time applications. So if someone is using a chatbot and asks a question, they don't want to wait a long time for an answer. If the latency is high, the user experience would be slow and frustrating. That makes sense. People expect quick responses, especially when they're interacting in real-time, like in a conversation.

Then there's real-time applications. I'm not entirely sure what qualifies as a real-time application, but maybe things like live translation or live subtitles. If you're translating speech in real-time, any delay could make the translation useless because the speaker has already moved on. So l