# Mixture of Agents: Under the Hood

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/togethercomputer/together-cookbook/blob/main/MoA_Under_The_Hood.ipynb)

## Introduction

In this notebook we will see how the [Mixture of Agents architechture](https://arxiv.org/abs/2406.04692) works under the hood. MoA is an innovative way to push open source models to achieve SoTA performance on complex reasoning tasks.

In a MoA setup, instead of simply prompting a single LLM, we pass the same prompt to a setup of multiple hand-picked LLMs. Their responses are then aggregated and passed forward for refinement either by future layers of parallel reference LLMs or finally to an aggregator LLM that combines togehter the individual responses and composes a superior final output.

In this notebook we will code up this architechture from scratch and examine the intermediate responses for a sample problem from the MATH dataset.

We will also make the equivalent call using the new `MoA-1` endpoint.

<img src="images/MoA_full.png" width="750">

## Install Libraries

In [1]:
!pip install -qU together

## 2 Layer MoA Architechture

To show a more concrete example - in this notebook we will implement one round of reference response generations by 4 LLMs and the 2nd layer will be an aggregator LLM.

Where `LLM 1 = WizardLM 8x22B`, `LLM 2 = Qwen2.5 72B Instruct`, `LLM 3 = Gemma 2 27B`, `LLM 4 = LLaMA 3.3 70B Instruct`
and the `Aggregator LLM = Qwen2.5 72B Instruct`

<img src="images/MoA_small.png" width="500">

In [None]:
# Initialize connection to Together client

import asyncio
import os
from together import AsyncTogether, Together

client = Together(api_key= TOGETHERAI_API_KEY)
async_client = AsyncTogether(api_key= TOGETHERAI_API_KEY)

In [None]:
# Lets setup the MoA architechture with reference LLMs = WizardLM 8x22B, Qwen2.5 72B Instruct, Gemma 2 27B, LLaMA 3.3 70B Instruct

reference_models = [
    "microsoft/WizardLM-2-8x22B",
    "Qwen/Qwen2.5-72B-Instruct-Turbo",
    "google/gemma-2-27b-it",
    "meta-llama/Llama-3.3-70B-Instruct-Turbo",
]

# This is a sample problem sourced from the MATH dataset: https://huggingface.co/datasets/lighteval/MATH?row=9
user_prompt = """Tim wants to invest some money in a bank which compounds quarterly
with an annual interest rate of $7\%$. To the nearest dollar, how much money should he
invest if he wants a total of $\$60,\!000$ at the end of $5$ years?"""

# The function below will call the reference LLMs in parallel
async def run_llm(model):
    """Run a single LLM call with a reference model."""
    for sleep_time in [1, 2, 4]:
        try:
            response = await async_client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": user_prompt}],
                temperature=0.7,
                max_tokens=512,
            )
            break
        except together.error.RateLimitError as e:
            print(e)
            await asyncio.sleep(sleep_time)
    return response.choices[0].message.content

In [None]:
# Generate intermediate reference responses
results = await asyncio.gather(*[run_llm(model) for model in reference_models])

In [10]:
len(results)

4

## Let's Examine the Intermediate Responses

Notice how some LLMs don't finish in the `max_tokens` limit, others get the answer incorrect.

In [16]:
for proposal in results:
  print(proposal)
  print(20*'====')

 Let's think step by step.To solve this problem, we will use the formula for compound interest, which is:

\[ A = P \left(1 + \frac{r}{n}\right)^{nt} \]

where:
- \( A \) is the amount of money accumulated after \( n \) years, including interest.
- \( P \) is the principal amount (the initial amount of money).
- \( r \) is the annual interest rate (in decimal form).
- \( n \) is the number of times that interest is compounded per year.
- \( t \) is the time the money is invested for, in years.

Given in the problem:
- The final amount \( A \) that Tim wants is $60,000.
- The annual interest rate \( r \) is 7%, which we convert to a decimal by dividing by 100, giving us 0.07.
- The interest is compounded quarterly, so \( n \) is 4.
- Tim wants the money after \( t \) = 5 years.

We need to find the principal \( P \) that Tim should invest.

Rearranging the compound interest formula to solve for \( P \), we get:

\[ P = \frac{A}{\left(1 + \frac{r}{n}\right)^{nt}} \]

Now, let's plug in t

## Aggregator Model

Here we'll concatenate the proposed responses above into the aggregator system prompt and will prompt the aggregator accordingly 

In [None]:
aggregator_model = "Qwen/Qwen2.5-72B-Instruct-Turbo"

aggregator_system_prompt = """You have been provided with a set of responses from various open-source models to the latest user query.
Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information
provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the
given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured,
coherent, and adheres to the highest standards of accuracy and reliability.

Responses from models:"""

final_stream = client.chat.completions.create(
   model=aggregator_model,
   messages=[
       {
           "role": "system",
           "content": aggregator_system_prompt + "\n" + "\n".join(
               f"{i+1}. {str(element)}" 
               for i, element in enumerate(results)
           )
       },
       {
           "role": "user",
           "content": user_prompt
       }
   ],
   stream=True
)

for chunk in final_stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

To determine how much Tim should invest to have a total of $60,000 at the end of 5 years with an annual interest rate of 7% compounded quarterly, we can use the compound interest formula:

\[ A = P \left(1 + \frac{r}{n}\right)^{nt} \]

where:
- \( A \) is the amount of money accumulated after \( t \) years, including interest.
- \( P \) is the principal amount (the initial amount of money).
- \( r \) is the annual interest rate (in decimal form).
- \( n \) is the number of times that interest is compounded per year.
- \( t \) is the time the money is invested for, in years.

Given:
- \( A = 60,000 \)
- \( r = 0.07 \)
- \( n = 4 \) (since the interest is compounded quarterly)
- \( t = 5 \)

We need to find \( P \). Rearranging the formula to solve for \( P \):

\[ P = \frac{A}{\left(1 + \frac{r}{n}\right)^{nt}} \]

Substituting the given values:

\[ P = \frac{60,000}{\left(1 + \frac{0.07}{4}\right)^{4 \times 5}} \]
\[ P = \frac{60,000}{\left(1 + 0.0175\right)^{20}} \]
\[ P = \frac{60,00

## Solution:

Recall the formula $A=P\left(1+\frac{r}{n}\right)^{nt}$, where $A$ is the end balance, $P$ is the principal, $r$ is the interest rate, $t$ is the number of years, and $n$ is the number of times the interest is compounded in a year. This formula represents the idea that the interest is compounded every $1/n$ years with the rate of $r/n$. Substituting the given information, we have \[60,\!000=P\left(1+\frac{0.07}{4}\right)^{4 \cdot 5}.\]Solving for $P$ gives $P=42409.474...$, which rounded to the nearest dollar is $\boxed{\$42409}$.

## Using the MoA-1 Endpoint

All of this functionality is what gets carried out behind the scenes when you call the MoA-1 endpoint!

In [None]:
import os
from together import Together

client = Together(api_key = "40ff5b8f38f893ee042dd620227519e44ca29ac3bb671bc276161debfdaa9d76")

user_prompt = """Tim wants to invest some money in a bank which compounds quarterly
with an annual interest rate of $7\%$. To the nearest dollar, how much money should he
invest if he wants a total of $\$60,\!000$ at the end of $5$ years?"""

response = client.chat.completions.create(
    model="togethercomputer/MoA-1",
    messages=[
        {
            "role": "user",
            "content": user_prompt,
        }
    ],
    max_tokens=512,
    temperature=0.7,
)

print(response.choices[0].message.content)

To determine how much money Tim should invest today to have a total of $60,000 at the end of 5 years with an annual interest rate of 7% compounded quarterly, we can use the formula for compound interest:

\[ A = P \left(1 + \frac{r}{n}\right)^{nt} \]

where:
- \( A \) is the amount of money accumulated after \( t \) years, including interest.
- \( P \) is the principal amount (the initial amount of money).
- \( r \) is the annual interest rate (decimal).
- \( n \) is the number of times interest is compounded per year.
- \( t \) is the time the money is invested for in years.

Given:
- \( A = 60,000 \)
- \( r = 0.07 \)
- \( n = 4 \) (since the interest is compounded quarterly)
- \( t = 5 \)

We need to solve for \( P \):

\[ 60,000 = P \left(1 + \frac{0.07}{4}\right)^{4 \times 5} \]

First, calculate the interest rate per quarter:

\[ \frac{0.07}{4} = 0.0175 \]

Next, calculate the total number of compounding periods:

\[ 4 \times 5 = 20 \]

Now, substitute these values into the formul