<center><a href="https://www.nvidia.com/en-us/training/"><img src="https://dli-lms.s3.amazonaws.com/assets/general/DLI_Header_White.png" width="400" height="186" /></a></center>

# <font color="#76b900"> **Exerpt From Notebook 7:** Towards Orchestration and Agentics</font>

In [1]:
# !pip install --upgrade langchain-unstructured "langchain<0.3.0" "langchain_core<0.3.0" langchain_nvidia_ai_endpoints

In [56]:
import requests
from langchain_nvidia_ai_endpoints import ChatNVIDIA

## USE THIS ONE TO START OUT WITH. NOTE IT'S INTENTED USE AS A VISUAL LANGUAGE MODEL FIRST
# model_path="http://localhost:9000/v1"
## USE THIS ONE FOR GENERAL USE AS A SMALL-BUT-PURPOSE CHAT MODEL BEING RAN LOCALLY VIA NIM
# model_path="http://nim:8000/v1"
# ## USE THIS ONE FOR ACCESS TO CATALOG OF RUNNING NIM MODELS IN `build.nvidia.com`
model_path="http://llm_client:9000/v1"

model_name = requests.get(f"{model_path}/models").json().get("data", [{}])[0].get("id")
%env NVIDIA_BASE_URL=$model_path
%env NVIDIA_DEFAULT_MODEL=$model_path
%env NVIDIA_DEFAULT_MODE=open

if "llm_client" in model_path:
    model_name = "mistralai/mistral-large-2-instruct"
    %env NVIDIA_MODEL_NAME=$model_name
else:
    %env NVIDIA_MODEL_NAME=$model_name

llm = ChatNVIDIA(model=model_name, base_url=model_path, max_tokens=5000, temperature=0)
llm.model = model_name

env: NVIDIA_BASE_URL=http://llm_client:9000/v1
env: NVIDIA_DEFAULT_MODEL=http://llm_client:9000/v1
env: NVIDIA_DEFAULT_MODE=open
env: NVIDIA_MODEL_NAME=mistralai/mistral-large-2-instruct


<hr>
<br>

## **Part 7.1:** Structuring LLM Workflows

Prompt engineering is the art and science of crafting inputs that guide LLMs to produce desired outputs. Since the models are inherently stochastic parrots at heart - meaning they output probabilistic responses based on their input and training data - you can expect certain tasks to be far easier than others for even the most powerful models. Fortunately, we know exactly what most of these models are best at: **"summarization" or synthesis-style tasks**.

**The reasons for short-form generation this is actually quite simple:**
- Long-form text generation is harder to keep on track and LLMs are likely to derail due to accumulating error of autoregressive sampling.
- Most developers want to conserve generation length and would rather have accidentally-overshort responses over accidentally-overlong ones.
- In chat applications (which are usually the most popular default models, and the most popular instruction format to work with), shorter-form, concise, and "summary-like" responses are often preferred in real-world scenarios.

**With that being said, long context reasoning is valued greatly:**
- The ability to specify long outputs to help reinforce generation style/format is highly desirable from an end-user perspective.
- Even if the generative prior for responses is towards short outputs, the accumulation of said short outputs can become very long very quickly.

**For these reasons, most models tend to be trained for short-form generation and long-form context ingestion.** This makes LLMs perfect for tasks like summarization and long-form question-answering, which at its core boils down to the problem of **knowledge synthesis** or **distillation**. This idea was already explored in code in the previous notebook, but it's important to formalize and keep in mind going forward.

In [57]:
import os 
workdir = ".."
filenames = [v for v in sorted(os.listdir(workdir)) if v.endswith(".ipynb")]
filenames

['00_jupyterlab.ipynb',
 '01_microservices.ipynb',
 '02_llms.ipynb',
 '03_langchain_intro.ipynb',
 '04_running_state.ipynb',
 '05_documents.ipynb',
 '06_embeddings.ipynb',
 '07_vectorstores.ipynb',
 '08_evaluation.ipynb',
 '09_langserve.ipynb',
 '64_guardrails.ipynb',
 '99_table_of_contents.ipynb']

In [58]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from jupyter_tools import FileLister

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", (
        "You are a helpful DLI Chatbot who can request and reason about notebooks."
        " Be as concise as necessary, but follow directions as best as you can."
        " Please help the user out by answering any of their questions and following their instructions."
    )),
    ("human", "Here is the notebook I want you to work with: {full_context}. Remembering this, start the conversation over."),
    ("ai", "Awesome! I will work with this as context and will restart the conversation."),
    ("placeholder", "{messages}")
])

def compute_context(state: dict):
    return FileLister().to_string(files=state.get("filenames"), workdir=workdir)

pipeline = (
    RunnablePassthrough.assign(full_context = compute_context)
    | chat_prompt 
    | llm 
    | StrOutputParser()
)

chat_state = {
    "filenames": filenames, 
    "messages": [("human", 
        "Can you give me a summary of the course, making sure to mention every notebook?"
        " Do a paragraph per notebook, and finish by explaining big-picture ideas to help an"
        " instructor explain the material and understand which parts of the course to refer to when addressing questions."
    )],
}

short_summary = ""
for chunk in pipeline.stream(chat_state):
    print(chunk, end="")
    short_summary += chunk

 Sure! Here is a summary of the course, with a paragraph for each notebook:

**Notebook 0: Jupyterlab.ipynb**
This notebook introduces the JupyterLab interface and its components, such as the menu bar, file browser, and main work area. It also explains how to execute code in a code cell and how to handle input requests and system messages.

**Notebook 1: The Course Environment**
This notebook explains the course environment, including the microservices and their roles. It also discusses the importance of the course environment for the learning objectives and how to access the running Jupyter Labs session.

**Notebook 2: Special Syntax**
This notebook covers special syntax used in the course, such as implicit string concatenation and implicit string concatenation with parentheses. It also explains how to handle input requests and system messages.

**Notebook 3: LangChain Expression Language**
This notebook introduces LangChain, a popular LLM orchestration library. It explains the core s

In [60]:
message_prompt = (
    "Give me a structured summary of the course, making sure to note all sections and points?"
    f"\n\nHere is a summary of all of the notebooks: \n\n{short_summary}\n\n"
    "Output only the summary of the currently-provided notebook, but explain the logical connections to the rest of the course."
    "Make sure the descriptions are also dense in key words that would be useful for searching through the notebooks via a bibliography."
    " Only output the following format, with no other structures, extra newlines, or info not grounded in context:\n<notebook>.ipynb"
    "\n - <Section 1 Name (As Seen In Notebook)>: Decent Description, including frameworks used, important topics, etc."
    "\n - ..."
    "\n - Main Ideas and Relevance To Course: Decent Description, dense with key features/frameworks/topics for bibliographic/semantic lookup."
    "\n - Important Code: Types of syntaxes, variables, etc that are more specific to this notebook, like classes, variables, topics, terms, etc."
    "\n - Connections to previous notebooks: Decent Description, dense with key features/frameworks/topics for bibliographic/semantic lookup."
    "\n - Relevant Images: Brief descriptions of images (i.e. <img src='/imgs/url.png'>, no non-image files) with local-scope URLs. Replace /img/ with /file=img/."
)

chat_states = []

for name in filenames:
    chat_states += [{
        "filenames": [name],
        "messages": [("human", message_prompt)]
    }]

summaries = pipeline.batch(chat_states)

In [69]:
print(summaries[6])

 ```
06_embeddings.ipynb
 - Part 1: Refreshing On Embedding Models: Introduction to embedding models, latent embeddings, word embeddings, sentence/document embeddings, decoder models, and encoder models.
 - Part 2: Using An NVIDIAEmbeddings Model: Practical application of NVIDIAEmbeddings model, query and document embedding, similarity checks, and visualization of cross-similarity matrix.
 - Part 3: A Synthetic - But More Realistic - Example: Expanding queries into longer-form documents, comparing embeddings of longer documents.
 - Part 4: Embeddings For Semantic Guardrails: Introduction to semantic guardrailing using embeddings.
 - Part 5: Wrap-Up: Summary and next steps.
 - Main Ideas and Relevance To Course: Embedding models, semantic reasoning, NVIDIAEmbeddings, query and document embedding, similarity checks, semantic guardrailing.
 - Important Code: NVIDIAEmbeddings, embed_query, embed_documents, cosine_similarity, plot_cross_similarity_matrix.
 - Connections to previous notebook

In [62]:
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

import PIL
import io
import base64

slide_dir = "../slides"
slide_names = [
    img_name for img_name in sorted(os.listdir(slide_dir))
    if any(img_name.lower().endswith(ext) for ext in (".jpeg", ".jpg", ".png"))
]

def encode_image_to_base64(image, format="JPEG"):
    """Encodes PIL JPEG image to base64_string data string"""
    buf = io.BytesIO()
    image.save(buf, format=format)
    base64_bytes = base64.b64encode(buf.getvalue())
    base64_string = base64_bytes.decode('utf-8')
    return f"data:image/{format.lower()};base64,{base64_string}"

vision_prompt = ChatPromptTemplate.from_messages([
    ("human", 'Describe the image, being accurate but compact: <img src="{encoding}" />'), 
    #  [
    #     {"type": "image_url", "image_url": {"url": "{encoding}"}},
    #     {"type": "text", "text": "Describe what is going on in this slide. Be specific."},
    # ]),
])

vlm_inputs = []

for img_name in slide_names:
    img = PIL.Image.open(os.path.join(slide_dir, img_name))
    ext = "PNG" if img_name.lower().endswith("png") else "JPEG" 
    img_encoding = encode_image_to_base64(img, ext.upper())
    vlm_inputs += [{"encoding": img_encoding}]

vlm = ChatNVIDIA(
    model="microsoft/phi-3.5-vision-instruct", 
    base_url="https://integrate.api.nvidia.com/v1", 
    max_tokens=5000, 
    temperature=0
)

vlm_pipe = vision_prompt | vlm | StrOutputParser()

slide_descriptions = vlm_pipe.batch(vlm_inputs)



In [65]:
# slide_descriptions[11] = slide_descriptions[11][:slide_descriptions[11].index(" Inside this box")]
# slide_descriptions[35] = slide_descriptions[35][:slide_descriptions[35].index(" The first column lists")]
# The first column lists

# print(slide_names[11])
# print(slide_descriptions[11])

long_slide_summaries = '\n'.join(': '.join(list(v)) for v in zip(slide_names, slide_descriptions))
long_nbook_summaries = '\n'.join(summaries)
long_nbook_summaries = long_nbook_summaries.replace("src='file=", "src='/file=")
# long_slide_summaries

In [76]:
input_message = [("system", 
    f"\n\nHere is a summary of all of the notebooks of a course: \n\n{long_nbook_summaries}\n\n"
    f"\n\nHere is the slide deck of the course:\n\n{long_slide_summaries}"
 ), ("user", 
    "Output a condensed summary covering all of the slides and tying them to the narrative of the course."
    " Make sure the descriptions are also dense in key words that would be useful for searching through the notebooks via a bibliography."
    " Output the response in the following format only:\n"
    "<img src='/file=slides/slide01.jpg'>: Introduction slide featuring the NVIDIA logo and the topic 'Building RAG Agents with LLMs'.\n"
)]

slide_outline = ""
for chunk in (llm | StrOutputParser()).stream(input_message):
    print(chunk, end="")
    slide_outline += chunk

 <img src='/file=slides/slide01.jpg'>: Introduction slide featuring the NVIDIA logo and the topic 'Building RAG Agents with LLMs'.
<img src='/file=slides/slide02.jpg'>: 3D rendering of large language models, illustrating components like central server, documents, speech bubbles, and databases.
<img src='/file=slides/slide03.jpg'>: Concept of Dialog/Retrieval Agents in LLMs with context and control, showing user interaction, agent retrieval, and response generation.
<img src='/file=slides/slide05.jpg'>: Promotional graphic for LangChain, highlighting components like 'Context', 'Tools', 'Test', 'Extraction', 'Storage & Indexing', and 'Agents'.
<img src='/file=slides/slide06.jpg'>: Promotional graphic for NVIDIA's AI capabilities, showcasing various AI applications like gaming, graphics, and simulations.
<img src='/file=slides/slide07.jpg'>: Prerequisites for 'RAG Agents in Production', listing prior LLM/LangChain exposure, intermediate Python experience, and web engineering exposure.
<im

In [49]:
# slide_outline = slide_outline[:slide_outline.index("### Continued Summary for Slides 84 and Up")]
# slide_outline

In [48]:
# for chunk in (llm | StrOutputParser()).stream(input_message + [
#     ("ai", slide_outline), 
#     ("user", 
#          "Please continue for slides 84 and up, using the exact same brief format like the following:"
#          " \n- **Slide 83**: Diagram illustrating GPU-Accelerating Vector Stores, featuring a bar chart comparing the search throughput of different vector search algorithms on a DEEP-100M vector store. <img src='/file=slides/slide83.jpg'>"
#     )
# ]):
#     print(chunk, end="")
#     slide_outline += chunk

In [51]:
# slide_outline = slide_outline.replace(" ### Continued Slide Summaries\n\n", "")
# slide_outline = slide_outline[:slide_outline.index("\n### Course Narrative")]
# slide_outline = slide_outline.replace("'sidle-turbo' and 'genemu-7b'", "'sdxl-turbo' and 'gemini-7b'")
# print(slide_outline)

In [77]:
## Note, we're going to need these generations for the assessment, so save them here
nbsummary = {
    "course": "NVIDIA Deep Learning Institute's Instructor-Led Course titled \"Building RAG Agents with LLMs\"",
    "summary": long_nbook_summaries,
    "slides": slide_outline,
    "filenames": filenames,
}
        
import json
json.dump(nbsummary, open('notebook_chunks.json', "w"), sort_keys=True, indent=4)

# with open('notebook_chunks.json', 'r') as fp:
#     nbsummary = json.load(fp)

<center><a href="https://www.nvidia.com/en-us/training/"><img src="https://dli-lms.s3.amazonaws.com/assets/general/DLI_Header_White.png" width="400" height="186" /></a></center>