# Serial Chain Agent Workflow
Author: [Zain Hasan](https://x.com/ZainHasan6)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/togethercomputer/together-cookbook/blob/main/Agents/Serial_Chain_Agent_Workflow.ipynb)

## Introduction

In this notebook, we'll create an LLM agent workflow that will produce an audio podcast file using the contents of a provided PDF.

In this **serial chain agent workflow**, we will call multiple LLMs consecutively (hence the name serial chain), where future LLM calls consume outputs from previous calls. The input to our workflow will be the raw extracted PDF text, and the output will be a JSON object that stores the lines our podcast host and guest will say.
Our serial chain will include the following steps:

1. LLM Call 1: Clean and extract details given source text extracted from a PDF
2. LLM Call 2: Generate an outline given extracted information and the source text
3. LLM Call 3: Generate a script given the information from step 1 and outline from step 2. This script will be a structured JSON object
4. Text-to-Speech Model Call: Text-to-Speech model to generate the podcast using the script

Before we implement the PDF to podcast workflow, let's first understand how simple the serial chain agent workflow is to implement.

## Serial Prompt Chain Agent Workflow

The serial prompt chain agent workflow takes the input prompt and processes it with consecutive LLM calls, feeding the output of the current LLM call to the next one down the chain. Optionally, you can also have a check between every consecutive LLM call that determines if we should break out of the chain earlier and return the response or terminate the workflow entirely.

The diagram below details the workflow:

<img src="../images/serial_chain.png" width="1000">

Now let's see this workflow in code:

## Setup and Utils

In [1]:
# For MAC OS X
#!brew install ffmpeg
# For Linux
!apt install ffmpeg

[1;31mE: [0mCould not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)[0m
[1;31mE: [0mUnable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?[0m


In [2]:
# Install libraries
!apt install libasound2-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg
!pip install -qU pydantic together pypdf ffmpeg-python cartesia

[1;31mE: [0mCould not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)[0m
[1;31mE: [0mUnable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?[0m


In [7]:
# Import libraries
import json
import together
from together import Together

from typing import Any, Optional, Dict, List, Literal
from pydantic import Field, BaseModel, ValidationError

TOGETHER_API_KEY = "--Your API Key--"

client = Together(api_key= TOGETHER_API_KEY)

In [9]:
# Simple LLM call helper function
def run_llm(user_prompt : str, model : str, system_prompt : Optional[str] = None):
    """ Run the language model with the given user prompt and system prompt. """
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    
    messages.append({"role": "user", "content": user_prompt})
    
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.7,
        max_tokens=4000,        
    )

    return response.choices[0].message.content

# Simple JSON mode LLM call helper function
def JSON_llm(user_prompt : str, schema : BaseModel, system_prompt : Optional[str] = None):
    """ Run a language model with the given user prompt and system prompt, and return a structured JSON object. """
    try:
        messages = []
        if system_prompt:
            messages.append({"role": "system", "content": system_prompt})
        
        messages.append({"role": "user", "content": user_prompt})
        
        extract = client.chat.completions.create(
            messages=messages,
            model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
            response_format={
                "type": "json_object",
                "schema": schema.model_json_schema(),
            },
        )
        
        response = json.loads(extract.choices[0].message.content)
        return response
        
    except ValidationError as e:
        raise ValueError(f"Schema validation failed: {str(e)}")

### Simple Agent Implementation

In [10]:
def serial_chain_workflow(input_query: str, prompt_chain : List[str]) -> List[str]:
    """Run a serial chain of LLM calls to address the `input_query` 
    using a prompts specified in a list `prompt_chain`.

    Outputs the chain of responses from the LLM models.
    """
    response_chain = [] # Will store the responses from the LLM models
    response = input_query
    for i, prompt in enumerate(prompt_chain):
        print(f"STEP {i+1}\n")
        response = run_llm(f"{prompt}\nInput:\n{response}", model='meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo')
        response_chain.append(response)
        print(f"{response}\n")
    return response_chain

In [11]:
# Toy example run of prompt chain agent workflow
question = "Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?"

prompt_chain = ["""Given the math problem, ONLY extract any relevant numerical information and how it can be used.""",
                """Given the numberical information extracted, ONLY express the steps you would take to solve the problem.""",
                """Given the steps, express the final answer to the problem."""]

responses = serial_chain_workflow(question, prompt_chain)

final_answer = responses[-1]

STEP 1

$12 (hourly wage), 
50 minutes (time worked)

To find the earnings, we need to convert 50 minutes to hours and multiply it by the hourly wage.

STEP 2

1. Convert 50 minutes to hours by dividing by 60 (since 1 hour = 60 minutes).
   50 minutes / 60 = 0.83 hours

2. Multiply the converted hours by the hourly wage.
   0.83 hours * $12/hour = earnings

STEP 3

To find the earnings, we need to multiply the converted hours by the hourly wage.

0.83 hours * $12/hour = earnings
earnings = 0.83 * 12
earnings = $9.96



## Podcast Generation Using Prompt Chain Agent Workflow

Now that we've solved a toy example using the prompt chaining agent workflow lets implement a bespoke version of this same workflow to handle a more complicated task: generation of an audio podcast file using the contents of a PDF!

In [12]:
# The system prompt will be the same for all LLMs in the chain
SYSTEM_PROMPT = """You are an experienced world-class podcast producer tasked with transforming the provided 
input text into an engaging and informative podcast.

You are to follow a step by step methodical process to generate the final podcast which involves:
1. Reading and extracting relevant information and snippets from the source document.
2. Using the relevant information compiled in step 1, creating an outline document containing brainstormed ideas, summarized topics that should be covered, questions and how to guide the conversation 
3. Using the details from step 1 and 2 you then need to put together a script for the podcast.
"""

Below we define a class that will control the structure that our script will be generated with.

We want the script to be a list of `DiologueItem`'s(a single line) + the speaker for that line. We also want to give the LLM throwaway token in the form of a scratch-pad so that it can generate better quality lines.

In [13]:
class DialogueItem(BaseModel):
    """A single dialogue item."""

    speaker: Literal["Host (Jane)", "Guest"]
    text: str


class Dialogue(BaseModel):
    """The dialogue between the host and guest."""

    scratchpad: str
    name_of_guest: str
    dialogue: List[DialogueItem]

#### PDF Content Extraction

The input to our agent workflow will be content extracted from a PDF

In [14]:
!wget https://arxiv.org/pdf/2406.04692
!mv 2406.04692 MoA.pdf

--2025-03-30 13:02:50--  https://arxiv.org/pdf/2406.04692
Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.195.42, 151.101.67.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1157463 (1.1M) [application/pdf]
Saving to: ‘2406.04692’


2025-03-30 13:02:50 (24.2 MB/s) - ‘2406.04692’ saved [1157463/1157463]



In [15]:
#import pathlib
from pathlib import Path
from pypdf import PdfReader

def get_PDF_text(file : str):
    text = ''

    # Read the PDF file and extract text
    try:
        with Path(file).open("rb") as f:
            reader = PdfReader(f)
            text = "\n\n".join([page.extract_text() for page in reader.pages])
    except Exception as e:
        raise f"Error reading the PDF file: {str(e)}"

        # Check if the PDF has more than ~131,072 characters
        # The context lenght limit of the model is 131,072 tokens and thus the text should be less than this limit
    if len(text) > 400_000:
        raise "The PDF is too long. Please upload a PDF with fewer than ~131072 characters."

    return text

text = get_PDF_text('./MoA.pdf')

### Now lets chain each LLM call

We will run each LLM in series and examine the intermediate outputs - this is a good idea when first setting up the agent workflow so that you can optimize and improve each step.
Steps:

1. LLM Call 1: Clean and extract details given source text extracted from a PDF
2. LLM Call 2: Generate an outline given extracted information and the source text
3. LLM Call 3: Generate a script given the information from step 1 and outline from step 2. This script will be a structured JSON object
4. Text-to-Speech Model Call: Text-to-Speech model to generate the podcast using the script

### Step 1: Extract details from the source document

In [16]:
# Since we want every LLM on the chain to execute slightly different tasks, we will define different user prompts for each LLM

CLEAN_EXTRACT_DETAILS = """The first step you need to perform is to extract details from the source document that are informative
and listeners will find useful to understand the source document better.

The input may be unstructured or messy, sourced from PDFs or web pages. 

Your goal is to extract the most interesting and insightful content for a compelling podcast discussion.

Source Document: {source_doc}
"""

source_doc = text

extracted_details = run_llm(CLEAN_EXTRACT_DETAILS.format(source_doc=source_doc), 
                            model='meta-llama/Llama-3.3-70B-Instruct-Turbo', 
                            system_prompt=SYSTEM_PROMPT)
print(extracted_details)

### Extracting Relevant Information and Snippets from the Source Document

The provided source document discusses a novel approach to enhancing the capabilities of large language models (LLMs) by leveraging the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) methodology. Key points and insights extracted from the document include:

1. **Introduction to MoA**: The MoA approach is designed to harness the strengths of multiple LLMs to improve their reasoning and language generation capabilities. It involves constructing a layered architecture where each layer comprises multiple LLM agents, with each agent taking the outputs from agents in the previous layer as auxiliary information to generate its response.

2. **Collaborativeness of LLMs**: The document highlights the phenomenon of collaborativeness among LLMs, where models tend to generate better responses when provided with outputs from other models, even if those outputs are of lower quality. This phenomenon is

### Step 2: Generate an outline based on the extracted details

In [17]:
OUTLINE_PROMPT = """The second step is to use the extracted information from the source document to write an outline and brainstorm ideas.

The source document and extracted details are provided below:

Extracted Details: {extracted_details}

Source Document: {source_doc}

Steps to follow when generating an outline and brainstorming ideas for the discussion in the podcast:

1. Analyze the Input:
   Carefully examine the extracted details in the text above, identifying key topics, points, and 
   interesting facts or anecdotes that could drive an engaging podcast conversation. 
   Disregard irrelevant information.

2. Brainstorm Ideas:
   Creatively brainstorm ways to present the key points engagingly. 
   
   Consider:
   - Analogies, storytelling techniques, or hypothetical scenarios to make content relatable
   - Ways to make complex topics accessible to a general audience
   - Thought-provoking questions to explore during the podcast
   - Creative approaches to fill any gaps in the information
   - Make sure that all important details extracted above are covered in the outline that you draft
"""

outline = run_llm(OUTLINE_PROMPT.format(extracted_details=extracted_details, source_doc=source_doc),
                    model='meta-llama/Llama-3.3-70B-Instruct-Turbo', 
                    system_prompt=SYSTEM_PROMPT)

print(outline)

### Podcast Outline: Exploring the Mixture-of-Agents (MoA) Methodology for Enhancing Large Language Models (LLMs)

#### Introduction (5 minutes)
- **Brief Overview of LLMs**: Introduce the concept of Large Language Models, their current capabilities, and limitations.
- **Introduction to MoA**: Explain the Mixture-of-Agents methodology, its purpose, and how it leverages the strengths of multiple LLMs.

#### Segment 1: Understanding MoA (15 minutes)
- **Collaborativeness of LLMs**: Discuss the phenomenon where LLMs generate better responses when provided with outputs from other models.
- **MoA Structure**: Describe the layered architecture of MoA, how each layer processes inputs, and generates outputs based on previous layer outputs.
- **Key Components**: Highlight the importance of proposer and aggregator models within the MoA framework.

#### Segment 2: Evaluation and Results (15 minutes)
- **Benchmark Performance**: Discuss the performance of MoA on benchmarks like AlpacaEval 2.0, MT-

### Step 3: Generate a script.

Below we define a class that will control the structure that our script will be generated with.

We want the script to be a list of `DialogueItems` (a single line) + the speaker for that line. We also want to give the LLM a throwaway token in the form of a scratch pad so that it can generate better quality lines.

In [18]:
class DialogueItem(BaseModel):
    """A single dialogue item."""

    speaker: Literal["Host (Jane)", "Guest"]
    text: str


class Dialogue(BaseModel):
    """The dialogue between the host and guest."""

    scratchpad: str
    name_of_guest: str
    dialogue: List[DialogueItem]

In [19]:
# Generate a JSON structured script based on the extracted details and the outline

SCRIPT_PROMPT = """The last step is to use the extracted details and the ideas brainstormed in the outline below to craft
a script for the podcast.

Extracted Details: {extracted_details}

Using the outline provided here: {outline}

Steps to follow when generating the script:

 1. **Craft the Dialogue:**
   Develop a natural, conversational flow between the host (Jane) and the guest speaker (the author or an expert on the topic).
   In the `<scratchpad>`, creatively brainstorm ways to present the key points engagingly.
   
   Incorporate:
   - The best ideas from your brainstorming session
   - Clear explanations of complex topics
   - An engaging and lively tone to captivate listeners
   - A balance of information and entertainment

   Rules for the dialogue:
   - The host (Jane) always initiates the conversation and interviews the guest
   - Include thoughtful questions from the host to guide the discussion
   - Incorporate natural speech patterns, including occasional verbal fillers (e.g., "Uhh", "Hmmm", "um," "well," "you know")
   - Allow for natural interruptions and back-and-forth between host and guest - this is very important to make the conversation feel authentic
   - Ensure the guest's responses are substantiated by the input text, avoiding unsupported claims
   - Maintain a PG-rated conversation appropriate for all audiences
   - Avoid any marketing or self-promotional content from the guest
   - The host concludes the conversation

2. **Summarize Key Insights:**
   Naturally weave a summary of key points into the closing part of the dialogue. This should feel like a casual conversation rather than a formal recap, reinforcing the main takeaways before signing off.

3. **Maintain Authenticity:**
   Throughout the script, strive for authenticity in the conversation. Include:
   - Moments of genuine curiosity or surprise from the host
   - Instances where the guest might briefly struggle to articulate a complex idea
   - Light-hearted moments or humor when appropriate
   - Brief personal anecdotes or examples that relate to the topic (within the bounds of the input text)

4. **Consider Pacing and Structure:**
   Ensure the dialogue has a natural ebb and flow:
   - Start with a strong hook to grab the listener's attention
   - Gradually build complexity as the conversation progresses
   - Include brief "breather" moments for listeners to absorb complex information
   - For complicated concepts, reasking similar questions framed from a different perspective is recommended
   - End on a high note, perhaps with a thought-provoking question or a call-to-action for listeners

IMPORTANT RULE: Each line of dialogue should be no more than 300 characters (e.g., can finish within 30 seconds)

Remember: Always reply in valid JSON format, without code blocks. Begin directly with the JSON output.
"""

script = JSON_llm(SCRIPT_PROMPT.format(extracted_details=extracted_details, outline=outline),
                    Dialogue,
                    system_prompt=SYSTEM_PROMPT)

script

{'scratchpad': "Let's create a compelling podcast script about the Mixture-of-Agents (MoA) methodology for enhancing Large Language Models (LLMs).",
 'name_of_guest': 'Dr. Maria Hernandez, AI Researcher',
 'dialogue': [{'speaker': 'Host (Jane)',
   'text': "Welcome to today's episode of 'AI Explained'! I'm your host, Jane. Joining me is Dr. Maria Hernandez, an expert in AI research. Dr. Hernandez, thanks for being here!"},
  {'speaker': 'Guest',
   'text': "Thanks, Jane! I'm excited to discuss the Mixture-of-Agents methodology and its potential to revolutionize Large Language Models."},
  {'speaker': 'Host (Jane)',
   'text': "Let's dive right in. Can you explain what Large Language Models are and their current limitations?"},
  {'speaker': 'Guest',
   'text': 'Large Language Models, or LLMs, are AI systems that can process and generate human-like language. However, they have limitations, such as struggling with complex reasoning and generating coherent text.'},
  {'speaker': 'Host (Ja

In [20]:
# We can also define a function that encapsulates the entire workflow for generating a podcast script from a PDF file.
def prompt_chain_podcast_workflow(file : str):
    text = get_PDF_text(file)
    source_doc = text
    
    extracted_details = run_llm(CLEAN_EXTRACT_DETAILS.format(source_doc=source_doc), 
                            model='meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo', 
                            system_prompt=SYSTEM_PROMPT)
    
    outline = run_llm(OUTLINE_PROMPT.format(extracted_details=extracted_details, source_doc=source_doc),
                    model='meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo',
                    system_prompt=SYSTEM_PROMPT)
    
    script = JSON_llm(SCRIPT_PROMPT.format(extracted_details=extracted_details, outline=outline),
                    Dialogue,
                    system_prompt=SYSTEM_PROMPT)
    return script

### Step 4: Generate audio podcast

We can loop through the lines in the script and generate them by calling the TTS model with specific voice and line configurations. The lines are all appended to the same buffer, and once the script finishes, we write this out to a WAV file, ready to be played.

In [21]:
import requests

def generate_audio(text: str, voice: str):
    """Generate audio from text using the specified voice model."""
    url = "https://api.together.ai/v1/audio/generations"
    
    headers = {
        "Authorization": f"Bearer {TOGETHER_API_KEY}"
    }
    
    data = {
        "input": text,
        "voice": voice,
        "response_format": "raw",
        "response_encoding": "pcm_f32le",
        "sample_rate": 44100,
        "stream": False,
        "model": "cartesia/sonic"
    }
    
    response = requests.post(url, headers=headers, json=data)

    return response.content

In [22]:
import ffmpeg

host_id = "laidback woman" # Jane - host
guest_id = "customer support man" # Guest

with open("podcast.pcm", "wb") as f:
    for line in script['dialogue']:
        if line['speaker'] == "Guest":
            voice_id = guest_id
        else:
            voice_id = host_id

        #print(f"{voice_id}: {line['text']}")
        raw_audio = generate_audio('-' + line['text'], voice_id) # the "-"" is to add a pause between speakers

        f.write(raw_audio)

# Convert the raw PCM bytes to a WAV file.
ffmpeg.input("podcast.pcm", format="f32le").output("podcast.wav").run()
print("Podcast generated successfully!")

ffmpeg version 9c33b2f Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 9.3.0 (crosstool-NG 1.24.0.133_b0863d8_dirty)
  configuration: --prefix=/home/roy/miniconda3/envs/pdf2podcast --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1627813612080/_build_env/bin/x86_64-conda-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-gpl --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-libx264 --enable-pic --enable-pthreads --enable-shared --enable-static --enable-version3 --enable-zlib --enable-libmp3lame --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1627813612080/_build_env/bin/pkg-config
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7

(None, None)

In [None]:
from IPython.display import Audio

Audio("podcast.wav")