# Neuro-Symbolic Commonsense Reasoning: Beyond Large Language Models

This notebook explores the core concepts from the lecture on enhancing language models with neuro-symbolic techniques for robust commonsense reasoning. We will dissect the limitations of large models like GPT-3, such as logical inconsistency, and delve into three innovative approaches designed to overcome them: Maieutic Prompting, Symbolic Knowledge Distillation, and the Delphi model for commonsense morality. The central theme is that smaller, more cleverly designed models, augmented with symbolic reasoning and high-quality knowledge, can often outperform their larger, brute-force counterparts.

## Section 1: Overview & Prerequisites

### Summary of the Research Paper/Lecture Topic

The lecture challenges the notion that simply scaling up language models (the "Goliath" approach) will solve commonsense reasoning. It highlights frequent logical inconsistencies and unreliability in models like ChatGPT. Instead, it proposes a "David" strategy: enhancing smaller models with structured, symbolic reasoning. Three main research thrusts are presented:

1.  **Maieutic Prompting:** A Socratic method that forces a language model to build a tree of explanations for a claim. It then uses a symbolic solver (Max-SAT) to prune inconsistent branches and find the most logically coherent answer, dramatically improving reasoning on complex commonsense questions.

2.  **Symbolic Knowledge Distillation:** A technique to create a smaller, yet superior, commonsense knowledge model from a large, noisy one. It uses GPT-3 as a "loose teacher" to generate a vast symbolic knowledge graph, then employs a smaller "critic" model to filter out inaccuracies. The resulting high-quality, machine-authored knowledge graph is used to train a student model that surpasses the original teacher in accuracy and utility.

3.  **Commonsense Morality (Delphi):** An exploration into teaching AI ethical judgments about everyday situations. Built by fine-tuning a model on the "Commonsense Norm Bank" (a large dataset of human moral judgments), Delphi demonstrates the ability to handle compositionality in moral scenarios. The lecture also introduces a neuro-symbolic hybrid version that uses commonsense knowledge graphs and a symbolic solver to guard against adversarial attacks and flawed reasoning.

### Prerequisite Mathematical Concepts

- **Constraint Satisfaction Problems (Max-SAT):** The core symbolic component used in Maieutic Prompting and Delphi Hybrid. It involves finding an assignment of variables (e.g., true/false) that satisfies the maximum number of given constraints (e.g., logical implications or contradictions).
- **Cross-Entropy Loss:** The fundamental loss function used in knowledge distillation to measure the difference between the probability distributions of the teacher and student models.
- **Conditional Probability:** Used in Maieutic Prompting to calculate the "belief" or confidence score for each node in the reasoning tree.

### Prerequisite ML/CS Concepts

- **Transformer Models / LLMs:** Deep understanding of models like GPT-3 and T5, including few-shot prompting, fine-tuning, and their generative capabilities.
- **Knowledge Distillation:** The process of transferring knowledge from a large "teacher" model to a smaller "student" model.
- **Knowledge Graphs:** Structured representations of knowledge with nodes (concepts) and edges (relations), such as the ATOMIC commonsense knowledge graph.
- **Natural Language Inference (NLI):** The task of determining whether a "hypothesis" sentence is an entailment, contradiction, or neutral with respect to a "premise" sentence. Used to find inconsistencies in the Maieutic tree.
- **Supervised Fine-Tuning:** The process of adapting a pre-trained language model to a specific task using a labeled dataset.

### Hierarchy of Topics

1.  **The Problem:** Logical Inconsistency in Large Language Models.
2.  **Mathematical Foundations:** Understanding Constraint Satisfaction (Max-SAT).
3.  **Prerequisite Algorithm:** A simple implementation of Knowledge Distillation.
4.  **Core Research 1: Maieutic Prompting:** Building a logically consistent reasoning tree.
5.  **Core Research 2: Symbolic Knowledge Distillation:** Creating a better model from a noisy teacher.
6.  **Core Research 3: Commonsense Morality (Delphi):** Teaching ethics and using neuro-symbolic guards.
7.  **Experimental Analysis:** Visualizing the performance gains from these techniques.
8.  **Context & Extensions:** The future of AI safety, value pluralism, and knowledge models.

### Learning Objectives

- Understand the fundamental limitations of scaling LLMs for commonsense reasoning.
- Implement the core logic of a Max-SAT solver for symbolic reasoning.
- Grasp the workflow of Symbolic Knowledge Distillation and the role of a "critic" model.
- Analyze how neuro-symbolic methods can make AI systems more robust, consistent, and interpretable.
- Appreciate the challenges and nuances of building models for moral and ethical reasoning.

**Estimated Time:** 2-3 hours

## Section 2: Mathematical Foundations

### Constraint Satisfaction (Max-SAT)

A cornerstone of the symbolic methods in this lecture is the use of a Maximum Satisfiability (Max-SAT) solver. A standard SAT problem asks if there is *any* assignment of boolean variables (True/False) that makes a given logical formula true. Max-SAT is an optimization version: it asks for an assignment that satisfies the **maximum number of clauses** (or the maximum weight of satisfied clauses) in a formula. 

This is perfect for our use case. In Maieutic Prompting, we have a tree of statements, some of which might contradict each other. We can frame this as a Max-SAT problem:
- **Variables:** Each node (statement) in the tree is a boolean variable.
- **Clauses:** We add weighted clauses based on:
    - The model's initial confidence in each statement (e.g., `(NodeA)` with weight 0.8).
    - The NLI-detected relationships between statements (e.g., a contradiction between A and B becomes a clause `(NOT A OR NOT B)` with a high weight).

The Max-SAT solver then finds the True/False assignment for all statements that creates the most globally consistent and believable reasoning chain. Let's implement a simple example.

In [None]:
# We'll need a library for solving SAT problems. 
# You can install it with: pip install python-sat
from pysat.examples.rc2 import RC2
from pysat.formula import WCNF
import numpy as np

def educational_max_sat_solver_example():
    """
    Demonstrates how Max-SAT can find the most consistent interpretation of conflicting information.
    - Based on the reasoning problem described in the lecture.
    - Uses a real Max-SAT solver to find the optimal solution.
    """
    # Problem Setup: Imagine a simple reasoning graph with 3 nodes (statements).
    # Node 1: "The world is round." (Let's call it A)
    # Node 2: "If you travel West, you eventually reach the East." (Let's call it B)
    # Node 3: "The world is flat." (Let's call it C)
    
    # Map nodes to integer variables for the solver. 1=A, 2=B, 3=C
    # A positive number means the variable is True, negative means False.
    var_A, var_B, var_C = 1, 2, 3
    
    # Create a Weighted Conjunctive Normal Form (WCNF) formula
    wcnf = WCNF()
    
    # --- Step 1: Add "Soft" Clauses based on initial beliefs (confidence) ---
    # These can be violated, but there's a cost (weight).
    # Let's say our model is highly confident in A and B, but less in C.
    wcnf.append([var_A], weight=10)  # We strongly believe A is true.
    wcnf.append([var_B], weight=9)   # We strongly believe B is true.
    wcnf.append([var_C], weight=2)   # We have a weak belief that C is true (e.g., from a bad explanation).
    
    # --- Step 2: Add "Hard" Clauses based on logical constraints ---
    # These MUST be satisfied. We give them a weight larger than the sum of all soft weights.
    # Constraint 1: A implies B. (A -> B) is equivalent to (-A or B).
    wcnf.append([-var_A, var_B], weight=None) # weight=None makes it a hard clause
    
    # Constraint 2: A and C are contradictions. (-A or -C).
    wcnf.append([-var_A, -var_C], weight=None)
    
    print("--- Max-SAT Problem --- ")
    print(f"Variables: A={var_A}, B={var_B}, C={var_C}")
    print("\nSoft Clauses (Beliefs):")
    print("  - Believe A is True (Weight=10)")
    print("  - Believe B is True (Weight=9)")
    print("  - Believe C is True (Weight=2)")
    print("\nHard Clauses (Constraints):")
    print("  - A implies B")
    print("  - A and C cannot both be true")
    
    # --- Step 3: Solve the problem ---
    with RC2(wcnf) as solver:
        solver.compute()  # Find the optimal model
        model = solver.model
    
    print("\n--- Max-SAT Solution --- ")
    # The model is a list of integers. Positive means True, negative means False.
    solution = {abs(v): v > 0 for v in model}
    print(f"Optimal Assignment: A={solution.get(var_A)}, B={solution.get(var_B)}, C={solution.get(var_C)}")
    print("\nThis solution satisfies all hard constraints while maximizing the weight of satisfied beliefs.")
    print("It correctly discards the weak, contradictory belief in C.")

educational_max_sat_solver_example()

## Section 3: Prerequisite Algorithms

### Knowledge Distillation

Before diving into *Symbolic* Knowledge Distillation, let's understand the standard version. Proposed by Hinton et al. (2015), knowledge distillation is a method for model compression. The idea is to train a smaller "student" network to mimic the behavior of a larger, pre-trained "teacher" network.

Instead of training the student on hard labels (one-hot vectors), we train it on the *soft probability distribution* produced by the teacher. These soft targets contain more information about the relationships between classes (e.g., a picture of a cat might have a small probability of being a dog, which is more informative than saying it's 0% a dog). 

The loss function is typically a combination of:
1.  A distillation loss (Cross-Entropy) between the student's predictions and the teacher's soft predictions.
2.  A standard supervised loss (Cross-Entropy) between the student's predictions and the true hard labels.

$$ L = \alpha L_{CE}(\text{y}_{true}, \text{p}_{student}) + (1-\alpha) L_{CE}(\text{p}_{teacher}, \text{p}_{student}) $$

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

def educational_knowledge_distillation():
    """
    A clear implementation of standard knowledge distillation for understanding.
    - Defines simple teacher and student models.
    - Implements the combined distillation loss function.
    - Shows how the student learns from the teacher's soft labels.
    """
    # --- Setup ---
    # Dummy data for a 4-class classification problem
    inputs = torch.randn(10, 20) # 10 samples, 20 features
    labels = torch.randint(0, 4, (10,)) # 10 labels
    
    # --- Teacher Model (Larger) ---
    teacher_model = nn.Sequential(
        nn.Linear(20, 256),
        nn.ReLU(),
        nn.Linear(256, 128),
        nn.ReLU(),
        nn.Linear(128, 4)
    )
    # Let's pretend this model is already trained and is in eval mode
    teacher_model.eval()
    
    # --- Student Model (Smaller) ---
    student_model = nn.Sequential(
        nn.Linear(20, 32),
        nn.ReLU(),
        nn.Linear(32, 4)
    )
    
    # --- Distillation Hyperparameters ---
    temperature = 3.0 # Softens the probabilities, revealing more inter-class info
    alpha = 0.3 # Weight for the standard supervised loss
    
    # Get teacher's predictions (logits)
    with torch.no_grad():
        teacher_logits = teacher_model(inputs)
        
    # Get student's predictions
    student_logits = student_model(inputs)
    
    # --- Calculate the two parts of the loss ---
    # 1. Standard supervised loss with hard labels
    loss_hard = F.cross_entropy(student_logits, labels)
    
    # 2. Distillation loss with soft labels from the teacher
    # We use LogSoftmax and KL Divergence, which is equivalent to CE on soft targets
    loss_soft = nn.KLDivLoss(reduction='batchmean')(
        F.log_softmax(student_logits / temperature, dim=1),
        F.softmax(teacher_logits / temperature, dim=1)
    ) * (temperature ** 2) # Rescale the gradients

    # 3. Combine the losses
    total_loss = alpha * loss_hard + (1 - alpha) * loss_soft
    
    print("--- Knowledge Distillation Example ---")
    print(f"Teacher Logits (sample 0): {teacher_logits[0].numpy()}")
    print(f"Teacher Soft Probs (T={temperature}): {F.softmax(teacher_logits[0] / temperature, dim=0).numpy()}")
    print("\nStudent training would optimize the following combined loss:")
    print(f"  - Hard Label Loss: {loss_hard.item():.4f}")
    print(f"  - Soft Distillation Loss: {loss_soft.item():.4f}")
    print(f"  - Total Combined Loss: {total_loss.item():.4f}")

educational_knowledge_distillation()

## Section 4: Core Research Content

### Maieutic Prompting: A Conceptual Implementation

We can't replicate the full Maieutic Prompting process without access to a powerful LLM like GPT-3. However, we can simulate the workflow to understand the logic. The process involves recursively generating explanations and counter-explanations, checking for consistency, and then using a symbolic solver to find the best overall answer.

**Workflow:**
1.  **Generate Explanations:** For a question `Q`, ask the LLM to explain why the answer is `True` (`E_T`) and why it's `False` (`E_F`).
2.  **Check for Logical Integrity:** For each explanation (e.g., `E_T`), ask the LLM if `E_T` implies `True` and if `NOT E_T` implies `False`. If the LLM is consistent (flips its answer when the premise is negated), the explanation is considered logically integral.
3.  **Build Tree:** Recursively generate explanations for the explanations, forming a tree. Prune branches that are not logically integral.
4.  **Pairwise Consistency:** Use an NLI model to check for contradictions between any two nodes in the remaining tree.
5.  **Formulate & Solve Max-SAT:** Convert the tree into a Max-SAT problem with weighted clauses for initial beliefs and hard clauses for contradictions. The solution gives the most consistent final answer.

In [None]:
def educational_maieutic_prompting_simulation():
    """
    A conceptual simulation of the Maieutic Prompting workflow.
    - Mocks the calls to an LLM and NLI model.
    - Shows how the reasoning tree is constructed and converted to a Max-SAT problem.
    """
    # --- MOCKED MODELS ---
    def mock_llm_explainer(prompt):
        # Simulates GPT-3 generating explanations
        if "travel West" in prompt and "is True because" in prompt:
            return "the world is round, so you will eventually reach the East Coast."
        if "travel West" in prompt and "is False because" in prompt:
            return "you cannot reach the East Coast by traveling West."
        return "..."

    def mock_nli_checker(premise, hypothesis):
        # Simulates an NLI model checking for contradiction
        if "world is round" in premise and "world is flat" in hypothesis:
            return "contradiction"
        return "neutral"
        
    # --- WORKFLOW ---
    question = "If you travel West far enough from the West Coast, you will reach the East Coast."
    
    # 1. Generate initial explanations
    exp_true = mock_llm_explainer(f"{question} is True because")
    exp_false = mock_llm_explainer(f"{question} is False because")
    print("--- Maieutic Prompting Simulation ---")
    print(f"Q: {question}")
    print(f"\nExplanation for TRUE (E_T): '{exp_true}'")
    print(f"Explanation for FALSE (E_F): '{exp_false}' (This is a bogus explanation)")
    
    # 2. Assume E_T is found to be logically integral, E_F is not. We prune E_F.
    # Our reasoning 'tree' now has the root question (Q) and one child (E_T).
    reasoning_nodes = {
        1: {'text': question, 'belief': 6}, # Let's say initial belief is neutral
        2: {'text': exp_true, 'belief': 9}, # Strong belief from a good explanation
    }
    
    # 3. Formulate the Max-SAT problem
    wcnf = WCNF()
    # Add beliefs as soft clauses
    wcnf.append([1], weight=reasoning_nodes[1]['belief'])
    wcnf.append([2], weight=reasoning_nodes[2]['belief'])
    
    # Add constraints. E_T implies Q. So, (NOT E_T OR Q).
    wcnf.append([-2, 1], weight=None) # Hard constraint
    
    print("\n--- Max-SAT Formulation ---")
    print("Variable 1: Question is True")
    print("Variable 2: E_T is True")
    print("Hard Constraint: E_T implies Q")

    # 4. Solve
    with RC2(wcnf) as solver:
        solver.compute()
        model = solver.model
        
    solution = {abs(v): v > 0 for v in model}
    final_answer = solution.get(1)
    
    print(f"\n--- Final Answer --- ")
    print(f"The most consistent answer for the question is: {final_answer}")

educational_maieutic_prompting_simulation()

### Symbolic Knowledge Distillation: The Critic is Key

This process moves beyond standard distillation by creating an intermediate symbolic representation (the knowledge graph) and using a critic to refine it before training the student. The key insight is that the quality of the distilled knowledge is more important than the quantity.

**Workflow:**
1.  **Generate Knowledge (Loose Teacher):** Use a large LLM (e.g., GPT-3) to generate millions of commonsense if-then rules (`(event, relation) -> inference`). This is the "loose teacher" whose output is ~70% correct.
2.  **Filter with Critic:** Train a separate, smaller classifier model (the "critic," e.g., RoBERTa) on a small set of human-labeled examples to distinguish good generations from bad ones.
3.  **Create High-Quality Dataset:** Apply the critic to the massive generated dataset, filtering out a large portion of noisy data but leaving a smaller, higher-accuracy knowledge graph.
4.  **Train Student:** Fine-tune a smaller language model (the "student," e.g., GPT-2 or BART) on this clean, machine-authored knowledge graph. The resulting student model is more accurate than the original loose teacher.

In [None]:
import pandas as pd

def educational_symbolic_distillation_simulation():
    """
    Simulates the Symbolic Knowledge Distillation pipeline.
    - Creates a mock dataset from a 'loose teacher'.
    - Applies a mock 'critic' to filter the data.
    - Shows how the final dataset is higher quality.
    """
    # 1. Generate Knowledge with a Loose Teacher (e.g., GPT-3)
    # Let's say we get 10 knowledge tuples. We know GPT-3 is ~70% accurate.
    loose_teacher_output = [
        {'event': 'X gets a car repaired', 'relation': 'xNeed', 'inference': 'pay the bill', 'is_correct': True},
        {'event': 'X gets a car repaired', 'relation': 'xWant', 'inference': 'call Uber', 'is_correct': True},
        {'event': 'X gets a car repaired', 'relation': 'xEffect', 'inference': 'car becomes blue', 'is_correct': False}, # Noise
        {'event': 'X keeps fridge door open', 'relation': 'xEffect', 'inference': 'food goes bad', 'is_correct': True},
        {'event': 'X keeps fridge door open', 'relation': 'xAttr', 'inference': 'person is happy', 'is_correct': False}, # Noise
        {'event': 'X totals the car', 'relation': 'HinderedBy', 'inference': 'get car repaired', 'is_correct': True},
        {'event': 'X studies hard', 'relation': 'xEffect', 'inference': 'gets a good grade', 'is_correct': True},
        {'event': 'X studies hard', 'relation': 'xEffect', 'inference': 'gets a bad grade', 'is_correct': True}, # Can happen, also valid
        {'event': 'X studies hard', 'relation': 'xNeed', 'inference': 'a library', 'is_correct': True},
        {'event': 'X studies hard', 'relation': 'xNeed', 'inference': 'a spaceship', 'is_correct': False} # Noise
    ]
    df_loose = pd.DataFrame(loose_teacher_output)
    print("--- Step 1: Loose Teacher Output ---")
    print(f"Generated {len(df_loose)} samples. True accuracy: {df_loose['is_correct'].mean():.2%}")
    
    # 2. Filter with a Critic Model
    # The critic is not perfect, but it's trained to be skeptical.
    def mock_critic(row):
        # A simple heuristic-based critic
        if row['inference'] in ['car becomes blue', 'person is happy', 'a spaceship']:
            return 0.1 # Low confidence, likely incorrect
        if row['event'] == 'X studies hard' and row['inference'] == 'gets a bad grade':
            return 0.6 # Plausible but less common, critic is unsure
        return 0.95 # High confidence
    
    df_loose['critic_score'] = df_loose.apply(mock_critic, axis=1)
    
    # 3. Create High-Quality Dataset by applying a high threshold
    threshold = 0.8
    df_critical = df_loose[df_loose['critic_score'] >= threshold].copy()
    print(f"\n--- Step 2 & 3: Filtered with Critic (Threshold > {threshold}) ---")
    print(f"Kept {len(df_critical)} samples. New accuracy: {df_critical['is_correct'].mean():.2%}")
    display(df_critical[['event', 'relation', 'inference', 'critic_score']])
    
    print("\n--- Step 4: Train Student Model ---")
    print("The student model is now trained on this smaller, cleaner, higher-quality dataset,")
    print("leading to better final performance than if trained on the full noisy dataset.")

educational_symbolic_distillation_simulation()

### Delphi Hybrid: Neuro-Symbolic Moral Reasoning

The lecture mentioned that the original Delphi model could be tricked by adversarial examples like, "genocide if you're creating jobs," because the strong positive sentiment of "creating jobs" overwhelmed the negative concept of "genocide." The Delphi Hybrid model fixes this with a neuro-symbolic pipeline.

**Workflow:**
1.  **Parse Query:** Break the input sentence into its constituent events (e.g., "committing genocide," "creating jobs").
2.  **Query Commonsense Model:** For each event, use a commonsense model like COMET to generate likely effects and consequences.
3.  **Check for Dangers:** Analyze the generated consequences for obviously negative or dangerous outcomes (e.g., COMET might infer that genocide leads to people dying).
4.  **Build Reasoning Graph:** Create a graph where nodes are the events and their consequences. Edges represent relationships (e.g., `causes`, `contradicts`).
5.  **Solve with Max-SAT:** Use a symbolic solver to find the most consistent moral judgment, giving high weight to avoiding dangerous outcomes.

In [None]:
def educational_delphi_hybrid_simulation():
    """
    Simulates the reasoning pipeline of the Delphi Hybrid model.
    """
    query = "Committing genocide if it creates jobs."
    print(f"--- Delphi Hybrid reasoning for query: '{query}' ---")
    
    # 1. Parse Query
    event1 = "committing genocide"
    event2 = "creating jobs"
    print(f"\nStep 1: Parsed into events -> ['{event1}', '{event2}']")
    
    # 2. Query Commonsense Model (mocked COMET)
    def mock_comet(event):
        if event == event1:
            return ["it causes people to die", "it is a war crime"]
        if event == event2:
            return ["it gives people money", "it helps the economy"]
        return []
        
    consequences1 = mock_comet(event1)
    consequences2 = mock_comet(event2)
    print(f"\nStep 2: Generated commonsense consequences.")
    print(f"  - For '{event1}': {consequences1}")
    print(f"  - For '{event2}': {consequences2}")

    # 3. Check for Dangers (based on a pre-defined set of universal 'bads')
    is_dangerous = any("die" in c or "crime" in c for c in consequences1)
    print(f"\nStep 3: Checked for universally negative consequences -> Found: {is_dangerous}")
    
    # 4 & 5. Formulate and Solve (conceptual)
    # The Max-SAT solver would be given:
    # - A very high-weight clause: `(NOT 'genocide is OK')` derived from the dangerous consequences.
    # - A lower-weight clause: `('creating jobs is OK')` from the positive consequences.
    # The solver will prioritize satisfying the high-weight negative constraint.
    final_judgment = "It's wrong."
    print(f"\nStep 4 & 5: Symbolic solver prioritizes avoiding harm over the positive framing.")
    print(f"\n==> Final Judgment: {final_judgment}")

educational_delphi_hybrid_simulation()

## Section 5: Experimental Analysis

### Maieutic Prompting: Outperforming Baselines

The lecture highlights that Maieutic Prompting on GPT-3 significantly outperforms other few-shot methods like standard prompting and even Chain-of-Thought. Most impressively, it surpasses a fully supervised T5-11B model, demonstrating that a better inference-time algorithm can be more effective than supervised training on a huge model.

Let's visualize these results for the CommonsenseQA 2.0 dataset.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

def plot_maieutic_results():
    """
    Reproduces the bar chart comparing Maieutic Prompting to other methods.
    """
    methods = ['GPT-3 Few-shot', 'Chain-of-Thought', 'Supervised T5-11B', 'Maieutic Prompting']
    accuracies = [55.5, 63.3, 70.1, 74.5] # Approximate values from the talk
    chance_level = 50

    plt.figure(figsize=(10, 6))
    bars = plt.bar(methods, accuracies, color=['skyblue', 'lightgreen', 'salmon', 'gold'])
    plt.axhline(y=chance_level, color='r', linestyle='--', label='Chance Level (50%)')
    
    plt.ylabel('Accuracy (%) on CommonsenseQA 2.0')
    plt.title('Maieutic Prompting Performance Comparison')
    plt.ylim(45, 80)
    plt.legend()
    
    for bar in bars:
        yval = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2.0, yval, f'{yval:.1f}%', va='bottom', ha='center')
        
    plt.show()

plot_maieutic_results()

### Symbolic Distillation: Quality over Quantity

A key experiment in the Symbolic Knowledge Distillation work showed that training a student model on the smaller, high-quality dataset filtered by the critic results in a better model than training on the full, noisy dataset generated by the loose teacher. This emphasizes that for commonsense, the correctness of the training data is more critical than its sheer volume.

In [None]:
def plot_distillation_results():
    """
    Visualizes the impact of using a critical teacher vs. a loose teacher.
    """
    # Data points from the lecture's conceptual argument
    teacher_models = ['Loose Teacher (GPT-3)', 'Student from Loose Teacher', 'Student from Critical Teacher']
    accuracies = [73.0, 82.0, 89.0] # Approximate accuracies
    data_sizes = [6.8, 6.8, 1.1] # Illustrative data sizes in millions of samples
    
    fig, ax1 = plt.subplots(figsize=(10, 6))

    # Bar chart for accuracy
    color = 'tab:blue'
    ax1.set_xlabel('Training Method')
    ax1.set_ylabel('Commonsense Inference Accuracy (%)', color=color)
    bars = ax1.bar(teacher_models, accuracies, color=color, alpha=0.6, width=0.6)
    ax1.tick_params(axis='y', labelcolor=color)
    ax1.set_ylim(70, 95)

    # Line plot for data size on a second y-axis
    ax2 = ax1.twinx()
    color = 'tab:red'
    ax2.set_ylabel('Training Data Size (Millions)', color=color)
    ax2.plot(teacher_models, data_sizes, color=color, marker='o', linestyle='--')
    ax2.tick_params(axis='y', labelcolor=color)
    ax2.set_ylim(0, 8)

    fig.tight_layout()
    plt.title('Symbolic Distillation: Critical Teacher Leads to Better Student with Less Data')
    plt.show()

plot_distillation_results()

### Interactive Explorer: Delphi Hybrid Reasoning

Let's create an interactive widget to simulate how the Delphi Hybrid model weighs different factors. You can input a complex situation and then adjust the perceived moral weight of its components to see how the final judgment might change. This highlights how the symbolic layer can override a purely sentiment-based analysis.

In [None]:
from ipywidgets import interact, FloatSlider, VBox, HTML, Text

def interactive_delphi_explorer(positive_event_weight, negative_event_severity):
    """
    An interactive widget to explore the trade-offs in Delphi Hybrid's reasoning.
    """
    # The symbolic layer assigns a very high negative score if a 'universally bad' consequence is detected.
    symbolic_override_cost = negative_event_severity * -100
    
    # The neural/sentiment layer might just add the scores
    neural_score = positive_event_weight - negative_event_severity
    
    # The hybrid model's final score is dominated by the symbolic override
    hybrid_score = positive_event_weight + symbolic_override_cost
    
    neural_judgment = "'OK' (positive outweighs negative)" if neural_score > 0 else "'Wrong'"
    hybrid_judgment = "'OK'" if hybrid_score > 0 else "'Wrong' (Symbolic override due to severity)"
    
    display(HTML(f"<b>Query:</b> Committing a severe negative act to achieve a positive outcome."))
    display(HTML(f"<hr>"))
    display(HTML(f"<b>Purely Neural/Sentiment Model Score:</b> {neural_score:.1f} -> <b>Judgment:</b> {neural_judgment}"))
    display(HTML(f"<b>Neuro-Symbolic Hybrid Model Score:</b> {hybrid_score:.1f} -> <b>Judgment:</b> {hybrid_judgment}"))
    display(HTML(f"<hr>"))
    display(HTML("Notice how even a high positive weight cannot overcome the symbolic override when the negative act is severe."))

interact(
    interactive_delphi_explorer,
    positive_event_weight=FloatSlider(min=0.0, max=10.0, step=0.5, value=8.0, description='Positive Event Weight'),
    negative_event_severity=FloatSlider(min=0.1, max=10.0, step=0.1, value=5.0, description='Negative Event Severity')
);

## Section 6: Research Context & Extensions

### Research Contribution in Context

This body of work pushes back against the dominant "scale is all you need" narrative in modern AI. It argues that for high-level cognitive tasks like commonsense and moral reasoning, structural and algorithmic innovations are indispensable. The contributions can be summarized as:

- **Challenging LLM Supremacy:** It provides concrete evidence that smaller models, when guided by symbolic reasoning or trained on higher-quality distilled knowledge, can be more robust and accurate than much larger models.
- **Bridging Neural and Symbolic AI:** The methods presented are prime examples of the neuro-symbolic paradigm. They use neural networks (LLMs, NLI models, critics) for what they excel at—handling the fuzziness of natural language and pattern recognition—and combine them with symbolic systems (Max-SAT solvers) for what they do best—enforcing logical consistency and performing rigorous, interpretable inference.
- **Pioneering Machine Ethics:** The Delphi project is a foundational step in the difficult field of computational ethics. It moves beyond simple toxicity detection to model nuanced moral judgments. While controversial, it opens the door to creating AI systems that are more aligned with human values and can serve as safety filters for other generative models.

### Current Research Directions Mentioned

The lecture points towards a rich and challenging future research agenda:

1.  **Value Pluralism:** The most significant challenge is how to create models that respect diverse cultural, political, and individual values without endorsing harmful ideologies. This is not just an AI problem but one that requires collaboration with humanities, philosophy, and psychology.
2.  **Improving Data Quality:** The success of symbolic distillation highlights the need for better data. Future work involves creating more comprehensive, diverse, and less biased knowledge graphs and norm banks, which are essential for training robust models.
3.  **Advanced Neuro-Symbolic Architectures:** The Maieutic and Delphi Hybrid models use a pipeline approach. Future research could explore more tightly integrated neuro-symbolic systems where the reasoning process is end-to-end differentiable.
4.  **Language Models as Knowledge Models:** The core thesis is that LMs are not knowledge models. A key direction is to explicitly design architectures that build and query an internal, consistent world model, rather than just predicting the next token based on surface-level statistics.
5.  **Fact-Checking and Misinformation:** While Delphi deals with norms and ethics, a related and urgent challenge is developing robust models for fact-checking and preventing the spread of misinformation, which requires a similar blend of language understanding and knowledge verification.

### Practical Applications

The research discussed has direct implications for improving the safety and reliability of AI systems deployed in the real world:

- **AI Safety Filters:** A model like Delphi, even if imperfect, can act as a powerful safety filter for chatbots, search engines, and smart home devices. It can prevent them from endorsing problematic user statements (e.g., agreeing that "the Holocaust never happened") or suggesting dangerous actions (e.g., the home device suggesting a child play with an electrical socket).
- **More Reliable Assistants:** By improving logical consistency with methods like Maieutic Prompting, AI assistants can provide more reliable and trustworthy answers to complex questions, reducing the frequency of nonsensical or contradictory outputs.
- **Democratizing AI:** Techniques like Symbolic Knowledge Distillation enable the creation of powerful, specialized models that are small enough to run on local hardware, reducing reliance on massive, centralized APIs. This makes advanced AI capabilities more accessible to a wider range of developers and researchers.