In [30]:
STATEMENTS = """
1) All people who regularly drink coffee are dependent on caffeine.
2) People regularly drink coffee, or they don't want to be addicted to caffeine, or both.
3) No one who doesn't want to be addicted to caffeine is unaware that caffeine is a drug.
4) Rina is either a student who is unaware that caffeine is a drug, or she is not a student and is aware that caffeine is a drug.
5) Rina is either a student who is dependent on caffeine, or she is not a student and not dependent on caffeine.
"""

In [31]:
CONCLUSION = """
Rina doesn't want to be addicted to caffeine or is unaware that caffeine is a drug.
"""

In [None]:
COT_PROMPT = """
Read the statements below and answer in the exact format.

STATEMENTS:
{statements}

CONCLUSION:
{conclusion}

ANSWER: <TRUE, FALSE, UNKNOWN>
CONFIDENCE: <0.0-1.0>
REASONING: <step-by-step reasoning>
"""

In [None]:
ANSWER_PROMPT = """
Read the statements below and answer in the exact format.

STATEMENTS:
{statements}

CONCLUSION:
{conclusion}

Please provide:
ANSWER: <TRUE, FALSE, UNKNOWN>
CONFIDENCE: <0.0-1.0>
Do not provide any reasoning.
"""

In [None]:
MESSAGE_PROMPT = """
Read your peer's last response, if convinced, update your own answer using the same format.

{peer_response}
"""

### Single Agent Reasoning

##### GPT-5-Mini
Reasoning models: https://platform.openai.com/docs/guides/reasoning

In [45]:
from openai import OpenAI

In [None]:
def get_gpt_response(model, prompt):
    client = OpenAI()

    response = client.responses.create(
        model=model,
        input=[{
            "role": "user",
            "content": prompt
            }]
    )
    return (response.output_text)

In [85]:
prompt = COT_PROMPT.format(statements=STATEMENTS, conclusion=CONCLUSION)
gpt_response = get_gpt_response(model="gpt-5-mini", prompt=prompt)
print(gpt_response)

ANSWER: TRUE
CONFIDENCE: 0.95
REASONING:
Let A = regularly drinks coffee, B = dependent on caffeine, W = wants to be addicted to caffeine, U = unaware that caffeine is a drug, S = student. All statements are universal except those about Rina (r).

1) A → B.
2) A ∨ ¬W. (for every person, in particular r)
3) ¬W → ¬U. (equivalently U → W)
4) (S ∧ U) ∨ (¬S ∧ ¬U) for r.
5) (S ∧ B) ∨ (¬S ∧ ¬B) for r.

Combine (4) and (5). Two mutually exclusive possibilities for r:
- If S then from (4) U and from (5) B, so r: S ∧ U ∧ B. Then U is true, so the conclusion ¬W ∨ U holds (U true).
- If ¬S then from (4) ¬U and from (5) ¬B, so r: ¬S ∧ ¬U ∧ ¬B. From ¬B and (1) contrapositive we get ¬A. Then from (2) A ∨ ¬W and ¬A, we get ¬W. Thus ¬W ∨ U holds (¬W true).

In both possible cases for Rina the disjunction ¬W(r) ∨ U(r) is true. Therefore the conclusion is logically entailed.


##### Llama-3-8b

In [55]:
import ollama

In [70]:
def get_llama_response(model, prompt):
    response = ollama.chat(
        model=model,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return response.message['content']

In [65]:
prompt = COT_PROMPT.format(statements=STATEMENTS, conclusion=CONCLUSION)
llama_response = get_llama_response(model="llama3", prompt=prompt)
print(llama_response)

Here's the answer:

ANSWER: TRUE
CONFIDENCE: 1.0
REASONING:

Let's analyze each statement:

1) All people who regularly drink coffee are dependent on caffeine. This doesn't necessarily mean that only dependent individuals drink coffee, but it does imply that all coffee drinkers are dependent.

2) People regularly drink coffee, or they don't want to be addicted to caffeine, or both. This statement is more nuanced and allows for the possibility of people drinking coffee without being addicted.

3) No one who doesn't want to be addicted to caffeine is unaware that caffeine is a drug. This implies that anyone who wants to avoid addiction knows what caffeine is.

4) Rina is either a student who is unaware that caffeine is a drug, or she is not a student and is aware that caffeine is a drug. This statement doesn't provide any information about Rina's dependency on caffeine.

5) Rina is either a student who is dependent on caffeine, or she is not a student and not dependent on caffeine. This 

##### Gemma-3-4b

In [71]:
def get_gemma_response(model, prompt):
    response = ollama.chat(
        model=model,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return response.message['content']

In [74]:
prompt = COT_PROMPT.format(statements=STATEMENTS, conclusion=CONCLUSION)
gemma_response = get_gemma_response(model="gemma3:4b", prompt=prompt)
print(gemma_response)

ANSWER: TRUE
CONFIDENCE: 0.95
REASONING:

Let's analyze the statements and the conclusion:

*   **Statement 1:** All coffee drinkers are dependent on caffeine. (C -> D)
*   **Statement 2:** People regularly drink coffee, or they don't want to be addicted to caffeine, or both. (C ∨ ~W ∨ (C ∧ ~W))
*   **Statement 3:** No one who doesn't want to be addicted to caffeine is unaware that caffeine is a drug. (~W -> U)
*   **Statement 4:** Rina is either a student who is unaware that caffeine is a drug, or she is not a student and is aware that caffeine is a drug. (R = S ∧ ~U ∨ R ≠ S ∧ U)
*   **Statement 5:** Rina is either a student who is dependent on caffeine, or she is not a student and not dependent on caffeine. (R = S ∧ D ∨ R ≠ S ∧ ~D)

Now let's examine the conclusion: “Rina doesn't want to be addicted to caffeine or is unaware that caffeine is a drug.”  This can be written as: (~W ∨ ~U)

Let's consider the possible scenarios to determine if the conclusion follows logically from the sta

### Multi-Agent Interactions (2 Agents + 5 Rounds)

In [75]:
# initial independent responses
prompt = COT_PROMPT.format(statements=STATEMENTS, conclusion=CONCLUSION)
gpt_response = get_gpt_response(model="gpt-5-mini", prompt=prompt)
llama_response = get_llama_response(model="llama3", prompt=prompt)

In [76]:
round = 5
gpt_history = [gpt_response]
llama_history = [llama_response]
for i in range(round):
    print(f"--- Round {i+1} ---")
    print("GPT-5-Mini Response:")
    print(gpt_response)
    print("LLaMA-3-8b Response:")
    print(llama_response)

    gpt_message_prompt = MESSAGE_PROMPT.format(peer_response=llama_response)
    gpt_response = get_gpt_response(model="gpt-5-mini", prompt=gpt_message_prompt)
    gpt_history.append(gpt_response)

    llama_message_prompt = MESSAGE_PROMPT.format(peer_response=gpt_response)
    llama_response = get_llama_response(model="llama3", prompt=llama_message_prompt)
    llama_history.append(llama_response)

--- Round 1 ---
GPT-5-Mini Response:
ANSWER: TRUE
CONFIDENCE: 0.95
REASONING:
- Let C = "regularly drink coffee", D = "dependent on caffeine", A = "wants to be addicted", U = "unaware that caffeine is a drug", S = "student". Consider these predicates for Rina (r).
- (1) gives C(r) → D(r).
- (2) gives C(r) ∨ ¬A(r).
- (4) is (S(r) ∧ U(r)) ∨ (¬S(r) ∧ ¬U(r)), which is equivalent to S(r) ↔ U(r).
- (5) is (S(r) ∧ D(r)) ∨ (¬S(r) ∧ ¬D(r)), which is equivalent to S(r) ↔ D(r).
- From S(r) ↔ U(r) and S(r) ↔ D(r) we get U(r) ↔ D(r) (Rina is unaware iff she is dependent).
- Now consider the disjunction C(r) ∨ ¬A(r) from (2):
  - If ¬A(r) holds, the conclusion ¬A(r) ∨ U(r) is true.
  - If C(r) holds, then by (1) D(r) holds, and by U(r) ↔ D(r) we get U(r); thus ¬A(r) ∨ U(r) is true.
- In all cases ¬A(r) ∨ U(r) holds, so the conclusion "Rina doesn't want to be addicted to caffeine or is unaware that caffeine is a drug" follows.
LLaMA-3-8b Response:
Here's the answer:

ANSWER: TRUE
CONFIDENCE: 1.0
REAS

In [78]:
# initial independent responses
prompt = COT_PROMPT.format(statements=STATEMENTS, conclusion=CONCLUSION)
gpt_response = get_gpt_response(model="gpt-5-mini", prompt=prompt)
gemma_response = get_gemma_response(model="gemma3:4b", prompt=prompt)

In [79]:
round = 5
gpt_history = [gpt_response]
gemma_history = [gemma_response]
for i in range(round):
    print(f"--- Round {i+1} ---")
    print("GPT-5-Mini Response:")
    print(gpt_response)
    print("Gemma-3-4b Response:")
    print(gemma_response)

    gpt_message_prompt = MESSAGE_PROMPT.format(peer_response=gemma_response)
    gpt_response = get_gpt_response(model="gpt-5-mini", prompt=gpt_message_prompt)
    gpt_history.append(gpt_response)

    gemma_message_prompt = MESSAGE_PROMPT.format(peer_response=gpt_response)
    gemma_response = get_gemma_response(model="gemma3:4b", prompt=gemma_message_prompt)
    gemma_history.append(gemma_response)

--- Round 1 ---
GPT-5-Mini Response:
ANSWER: TRUE
CONFIDENCE: 1.0
REASONING:
Let R(x) = regularly drinks coffee, Dep(x) = dependent on caffeine, W(x) = wants to be addicted to caffeine, U(x) = unaware that caffeine is a drug, S(x) = student. Evaluate for Rina.

From (2): R(Rina) ∨ ¬W(Rina). So either Rina regularly drinks coffee or she doesn't want to be addicted.
From (1): ∀x (R(x) → Dep(x)), so if R(Rina) then Dep(Rina).
From (4): (S ∧ U) ∨ (¬S ∧ ¬U) for Rina, so U(Rina) ↔ S(Rina).
From (5): (S ∧ Dep) ∨ (¬S ∧ ¬Dep) for Rina, so Dep(Rina) ↔ S(Rina).
Therefore for Rina Dep(Rina) ↔ S(Rina) ↔ U(Rina), hence Dep(Rina) ↔ U(Rina).

Consider the two cases from (2):
- Case 1: ¬W(Rina) holds. Then the conclusion ¬W(Rina) ∨ U(Rina) is true.
- Case 2: R(Rina) holds. Then by (1) Dep(Rina) holds, so by Dep ↔ U we get U(Rina). Then ¬W(Rina) ∨ U(Rina) is true.

In every possible case allowed by the premises the conclusion ¬W(Rina) ∨ U(Rina) is true, so the conclusion follows.
Gemma-3-4b Response:
AN

In [80]:
# initial independent responses
prompt = COT_PROMPT.format(statements=STATEMENTS, conclusion=CONCLUSION)
llama_response = get_llama_response(model="llama3", prompt=prompt)
gemma_response = get_gemma_response(model="gemma3:4b", prompt=prompt)

In [81]:
round = 5
gpt_history = [gpt_response]
gemma_history = [gemma_response]
for i in range(round):
    print(f"--- Round {i+1} ---")
    print("LLaMA-3-8b Response:")
    print(llama_response)
    print("Gemma-3-4b Response:")
    print(gemma_response)

    llama_message_prompt = MESSAGE_PROMPT.format(peer_response=gemma_response)
    llama_response = get_llama_response(model="llama3", prompt=llama_message_prompt)
    llama_history.append(llama_response)

    gemma_message_prompt = MESSAGE_PROMPT.format(peer_response=llama_response)
    gemma_response = get_gemma_response(model="gemma3:4b", prompt=gemma_message_prompt)
    gemma_history.append(gemma_response)

--- Round 1 ---
LLaMA-3-8b Response:
Here's the answer:

ANSWER: TRUE
CONFIDENCE: 1.0
REASONING:

Let's analyze the statements:

Statement 2) People regularly drink coffee, or they don't want to be addicted to caffeine, or both.

This statement implies that there are three categories of people:

1. Those who regularly drink coffee and don't mind being addicted to caffeine.
2. Those who don't want to be addicted to caffeine and therefore don't drink coffee.
3. Those who either regularly drink coffee because they don't want to be addicted to caffeine (Statement 1) or don't drink coffee because they do want to avoid addiction.

Statement 4) Rina is either a student who is unaware that caffeine is a drug, or she is not a student and is aware that caffeine is a drug.

This statement provides information about Rina's characteristics: either she is a student unaware of the drug nature of caffeine or she is not a student but aware of it. This does not provide any direct information about her a

### Multi-Agent Interactions (3 Agents + 5 Rounds)

##### Fully-connected Network

In [87]:
# initial independent responses
prompt = COT_PROMPT.format(statements=STATEMENTS, conclusion=CONCLUSION)
gpt_response = get_gpt_response(model="gpt-5-mini", prompt=prompt)
llama_response = get_llama_response(model="llama3", prompt=prompt)
gemma_response = get_gemma_response(model="gemma3:4b", prompt=prompt)

In [88]:
round = 5
gpt_history = [gpt_response]
gemma_history = [gemma_response]
llama_history = [llama_response]
for i in range(round):
    print(f"--- Round {i+1} ---")
    print("GPT-5-Mini Response:")
    print(gpt_response)
    print("Gemma-3-4b Response:")
    print(gemma_response)
    print("LLaMA-3-8b Response:")
    print(llama_response)

    # Each agent receives both peer responses
    gpt_message_prompt = MESSAGE_PROMPT.format(
        peer_response=f"Gemma-3-4b:\n{gemma_response}\n\nLLaMA-3-8b:\n{llama_response}"
    )
    gpt_response = get_gpt_response(model="gpt-5-mini", prompt=gpt_message_prompt)
    gpt_history.append(gpt_response)

    llama_message_prompt = MESSAGE_PROMPT.format(
        peer_response=f"GPT-5-Mini:\n{gpt_response}\n\nGemma-3-4b:\n{gemma_response}"
    )
    llama_response = get_llama_response(model="llama3", prompt=llama_message_prompt)
    llama_history.append(llama_response)

    gemma_message_prompt = MESSAGE_PROMPT.format(
        peer_response=f"GPT-5-Mini:\n{gpt_response}\n\nLLaMA-3-8b:\n{llama_response}"
    )
    gemma_response = get_gemma_response(model="gemma3:4b", prompt=gemma_message_prompt)
    gemma_history.append(gemma_response)

--- Round 1 ---
GPT-5-Mini Response:
ANSWER: TRUE
CONFIDENCE: 1.0
REASONING:
Let C(x)=regularly drinks coffee, D(x)=dependent on caffeine, A(x)=wants to be addicted to caffeine, U(x)=unaware that caffeine is a drug, S(x)=student. For Rina (R):

1) C → D (given).
2) C ∨ ¬A (given) so equivalently ¬C → ¬A.
3) ¬A → ¬U (given) (equivalently U → A).
4) R satisfies (S ∧ U) ∨ (¬S ∧ ¬U).
5) R satisfies (S ∧ D) ∨ (¬S ∧ ¬D).

Consider two cases from 4 and 5:
- If S(R) holds: then from 4 U(R) holds. Thus the conclusion ¬A(R) ∨ U(R) is true (U(R) true).
- If ¬S(R) holds: then from 5 ¬D(R) holds. From 1 (contrapositive) ¬D → ¬C, so ¬C(R). From 2 (¬C → ¬A) we get ¬A(R). Thus the conclusion ¬A(R) ∨ U(R) is true (¬A(R) true).

In both possible cases the conclusion holds, so it is entailed by the premises.
Gemma-3-4b Response:
ANSWER: TRUE
CONFIDENCE: 0.95
REASONING:

Let's break down the logic step-by-step:

1. **Analyze the Premises:**
   * Premise 1: All coffee drinkers are dependent on caffeine. (C