# 🔬 Research Extensions: Next-Level AI
**UofT AI Agents Club - Advanced Track**

Ready to explore cutting-edge AI research? Let's implement some **real research ideas**!

## What's Here
1. **Meta-Reflection** - Agents that critique their own critique process
2. **Smart Evaluation** - Advanced ways to measure response quality
3. **Research Ideas** - Concepts from recent AI papers
4. **Your Research** - Design your own experiments

**Prerequisites**: Complete both previous notebooks first

**Note**: This is experimental - some things might not work perfectly!

Let's push boundaries! 🚀

In [None]:
# 1. Research-Grade Self-Reflecting Agent Setup

import random
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display, Markdown

class ResearchAgent:
    def __init__(self):
        self.trace_history = []
        self.meta_insights = []
    
    def generate_with_confidence(self, problem, strategy='balanced'):
        """Generate response with confidence scoring"""
        responses = {
            'conservative': f"Careful analysis of '{problem}': Start with proven approaches, validate each step, minimize risks.",
            'aggressive': f"Bold approach to '{problem}': Push boundaries, try novel solutions, accept calculated risks.",
            'balanced': f"Balanced solution for '{problem}': Combine proven methods with innovative elements, validate thoroughly."
        }
        
        response = responses.get(strategy, responses['balanced'])
        confidence = random.uniform(0.7, 0.95)  # Simulate confidence scoring
        
        return response, confidence
    
    def multi_perspective_evaluate(self, response):
        """Evaluate from multiple perspectives"""
        perspectives = {
            'technical': self._evaluate_technical(response),
            'practical': self._evaluate_practical(response), 
            'innovative': self._evaluate_innovative(response)
        }
        return perspectives
    
    def _evaluate_technical(self, response):
        score = 0.8 if any(word in response.lower() for word in ['algorithm', 'complexity', 'data', 'system']) else 0.6
        return {'score': score, 'feedback': 'Technical depth assessment'}
    
    def _evaluate_practical(self, response):
        score = 0.9 if any(word in response.lower() for word in ['step', 'implement', 'validate', 'test']) else 0.5
        return {'score': score, 'feedback': 'Practical applicability check'}
    
    def _evaluate_innovative(self, response):
        score = 0.7 if any(word in response.lower() for word in ['novel', 'innovative', 'creative', 'bold']) else 0.8
        return {'score': score, 'feedback': 'Innovation level analysis'}
    
    def meta_reflect(self, problem, solution_trace):
        """Reflect on the reflection process itself"""
        analysis = f"""
        Meta-Analysis for "{problem}":
        
        Process Quality: {'Excellent' if len(solution_trace) <= 3 else 'Could be more efficient'}
        Convergence: {'Fast' if len(solution_trace) <= 2 else 'Gradual'}
        Strategy Effectiveness: {random.choice(['High', 'Medium', 'Variable'])}
        
        Recommendations:
        - {'Continue current approach' if len(solution_trace) <= 2 else 'Consider strategy adjustment'}
        - Focus on {random.choice(['technical depth', 'practical steps', 'innovative thinking'])}
        - {random.choice(['Increase', 'Maintain', 'Optimize'])} iteration count
        """
        
        self.meta_insights.append(analysis)
        return analysis
    
    def research_solve(self, problem, max_iterations=3):
        """Research-grade solving with advanced features"""
        print(f"🔬 Research Problem: {problem}")
        print("-" * 60)
        
        trace = []
        
        for i in range(max_iterations):
            print(f"\n🔄 Research Iteration {i+1}")
            
            # Generate with confidence
            response, confidence = self.generate_with_confidence(problem)
            print(f"💡 Response (confidence: {confidence:.2f}): {response[:80]}...")
            
            # Multi-perspective evaluation
            evaluations = self.multi_perspective_evaluate(response)
            avg_score = np.mean([eval_data['score'] for eval_data in evaluations.values()])
            
            print(f"🔍 Multi-Perspective Score: {avg_score:.2f}")
            for perspective, eval_data in evaluations.items():
                print(f"  • {perspective}: {eval_data['score']:.2f}")
            
            trace.append({
                'iteration': i+1,
                'response': response,
                'confidence': confidence,
                'avg_score': avg_score,
                'evaluations': evaluations
            })
            
            # Early stopping if high quality
            if avg_score > 0.85 and confidence > 0.9:
                print("✅ High quality achieved - stopping early")
                break
                
            # Adaptive refinement
            if avg_score < 0.7:
                response = f"[ENHANCED] {response} [Added depth and validation based on multi-perspective analysis]"
        
        # Meta-reflection
        meta_analysis = self.meta_reflect(problem, trace)
        print(f"\n🧠 Meta-Reflection:")
        print(meta_analysis)
        
        self.trace_history.append(trace)
        return trace[-1]['response'] if trace else "No solution generated"

# Initialize research agent
research_agent = ResearchAgent()
print("🔬 Research Agent initialized!")
print("🧠 Advanced features: confidence scoring, multi-perspective evaluation, meta-reflection")

## 2. 🧠 Experiment 1: Meta-Reflection

**Research Question:** Can AI improve by analyzing its own thinking process?

This experiment runs two problems through both a meta-reflective agent and a research agent, then displays the final result and a collapsible trace for each. Meta-reflection analysis is also shown in a collapsible block.

In [None]:
# Meta-Reflection Implementation (Streamlined)
from typing import List, Dict
import numpy as np

class MetaReflectiveAgent:
    """Agent that can reflect on its own reflection process"""
    
    def __init__(self):
        self.meta_trace = []  # Track reflection strategy evolution
        self.strategies = ['technical', 'creative', 'systematic']
        self.evaluations = ['critical', 'constructive', 'comprehensive']
    
    def simple_solve(self, problem: str, max_iterations: int = 2) -> List[Dict]:
        """Simple solve method that returns a trace for meta-analysis"""
        trace = []
        
        # Simulate a solving process
        response = f"Initial approach to '{problem}': Start with basic solution."
        for i in range(max_iterations):
            critique = f"Iteration {i+1} critique: Could be more detailed and specific."
            refined = f"Refined iteration {i+1}: {response} Added more technical depth and examples."
            
            trace.append({
                'iteration': i+1,
                'response': response,
                'critique': critique,
                'refined': refined
            })
            response = refined
        
        return trace
    
    def meta_reflect(self, problem: str, reflection_trace: List[Dict]) -> str:
        """Analyze the quality of the reflection process itself"""
        
        # Simulate meta-reflection analysis
        meta_analysis = self._simulate_meta_reflection(problem, reflection_trace)
        
        self.meta_trace.append({
            'problem': problem,
            'reflection_trace': reflection_trace,
            'meta_analysis': meta_analysis
        })
        
        return meta_analysis
    
    def _simulate_meta_reflection(self, problem: str, trace: List[Dict]) -> str:
        """Simulate meta-reflection analysis"""
        effectiveness = 'good' if len(trace) <= 3 else 'excessive'
        strategy_fit = 'appropriate' if 'technical' in problem.lower() else 'could be improved'
        progression = 'improved' if len(trace) > 1 else 'unclear'
        focus = 'clear' if len(trace) >= 2 else 'limited'
        confidence = np.random.randint(70, 95)
        
        return f"""
        **Meta-Reflection Analysis:**
        
        **Effectiveness**: The reflection process showed {effectiveness} convergence
        for this problem type. The critique strategy was {strategy_fit}
        for the domain.
        
        **Quality Progression**: Response quality {progression} 
        across iterations with {focus} refinement focus.
        
        **Recommendations**: 
        1. Consider domain-specific evaluation criteria
        2. Implement early stopping for convergence
        3. Add confidence scoring to responses
        4. Explore alternative refinement strategies
        
        **Confidence**: {confidence}% confident in this meta-analysis
        """

# Initialize meta-reflective agent
meta_agent = MetaReflectiveAgent()
print("🔄 Meta-Reflective Agent initialized!")
print("🧠 Ready for second-order reflection experiments!")

# --- Problem 1 ---
research_problem_1 = "How do I design a fault-tolerant distributed system?"
trace_1 = meta_agent.simple_solve(research_problem_1, max_iterations=3)
result_1 = research_agent.research_solve(research_problem_1, max_iterations=3)

print("\n🎯 Final Result (Problem 1):")
print(result_1)

trace_md = """<details><summary>🔎 Show Full Reflection Trace (Problem 1)</summary>\n"""
for step in trace_1:
    trace_md += f"\n<b>Iteration {step['iteration']}</b><br>"
    trace_md += f"<b>Response:</b> {step['response']}<br>"
    if 'critique' in step:
        trace_md += f"<b>Critique:</b> {step['critique']}<br>"
    if 'refined' in step:
        trace_md += f"<b>Refined:</b> {step['refined']}<br>"
    trace_md += "<hr>"
trace_md += "</details>"
display(Markdown(trace_md))

# --- Problem 2 ---
research_problem_2 = "What's the best approach for real-time machine learning?"
trace_2 = meta_agent.simple_solve(research_problem_2, max_iterations=2)
result_2 = research_agent.research_solve(research_problem_2, max_iterations=2)

print("\n🎯 Final Result (Problem 2):")
print(result_2)

trace_md2 = """<details><summary>🔎 Show Full Reflection Trace (Problem 2)</summary>\n"""
for step in trace_2:
    trace_md2 += f"\n<b>Iteration {step['iteration']}</b><br>"
    trace_md2 += f"<b>Response:</b> {step['response']}<br>"
    if 'critique' in step:
        trace_md2 += f"<b>Critique:</b> {step['critique']}<br>"
    if 'refined' in step:
        trace_md2 += f"<b>Refined:</b> {step['refined']}<br>"
    trace_md2 += "<hr>"
trace_md2 += "</details>"
display(Markdown(trace_md2))

# --- Meta-Reflection Analysis ---
meta_analysis_1 = meta_agent.meta_reflect(research_problem_1, trace_1)
meta_analysis_2 = meta_agent.meta_reflect(research_problem_2, trace_2)

meta_md = """<details><summary>📊 Show Meta-Reflection Analysis</summary>\n"""
meta_md += f"<pre>{meta_analysis_1}</pre>"
meta_md += "<hr>"
meta_md += f"<pre>{meta_analysis_2}</pre>"
meta_md += "</details>"
display(Markdown(meta_md))

print("\n📊 Meta-Reflection Analysis:")
print(f"Total experiments conducted: {len(research_agent.trace_history)}")
print(f"Meta-insights generated: {len(research_agent.meta_insights)}")
print("\n✨ Key Finding: Meta-reflection helps identify when to stop iterating!")

## 3. 📈 Experiment 2: Performance Analysis

**Research Question:** How do different strategies perform across problem types?

This experiment compares strategies on several problems, showing concise metrics and a performance plot.

In [None]:
# Performance Analysis (Streamlined)
class PerformanceAnalyzer:
    """Analyze and compare the performance of different strategies in problem-solving"""
    
    def __init__(self):
        self.results = []
    
    def analyze_strategy_performance(self, problems, strategies):
        """Compare strategy performance across problems"""
        results = {}
        
        for problem in problems:
            results[problem] = {}
            for strategy in strategies:
                response, confidence = research_agent.generate_with_confidence(problem, strategy)
                evaluations = research_agent.multi_perspective_evaluate(response)
                avg_score = np.mean([eval_data['score'] for eval_data in evaluations.values()])
                
                results[problem][strategy] = {
                    'confidence': confidence,
                    'quality_score': avg_score,
                    'response_length': len(response)
                }
        
        return results
    
    def plot_performance(self, results):
        """Visualize performance comparison"""
        strategies = list(next(iter(results.values())).keys())
        problems = list(results.keys())
        
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
        
        # Quality scores
        quality_data = []
        for strategy in strategies:
            scores = [results[problem][strategy]['quality_score'] for problem in problems]
            quality_data.append(scores)
            ax1.plot(range(len(problems)), scores, marker='o', label=strategy)
        
        ax1.set_title('Strategy Quality Comparison')
        ax1.set_xlabel('Problem Index')
        ax1.set_ylabel('Quality Score')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Confidence levels
        for strategy in strategies:
            confidence = [results[problem][strategy]['confidence'] for problem in problems]
            ax2.plot(range(len(problems)), confidence, marker='s', label=strategy)
        
        ax2.set_title('Strategy Confidence Comparison')
        ax2.set_xlabel('Problem Index')
        ax2.set_ylabel('Confidence Level')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        return quality_data

# Run performance analysis
analyzer = PerformanceAnalyzer()

test_problems = [
    "Optimize database query performance",
    "Implement secure authentication system", 
    "Design scalable microservices architecture",
    "Build real-time data processing pipeline"
]

test_strategies = ['conservative', 'aggressive', 'balanced']

print("📈 Running Performance Analysis...")
performance_results = analyzer.analyze_strategy_performance(test_problems, test_strategies)

print("\n📉 Performance Summary:")
for i, problem in enumerate(test_problems):
    print(f"\nProblem {i+1}: {problem[:40]}...")
    for strategy in test_strategies:
        data = performance_results[problem][strategy]
        print(f"  {strategy}: Quality={data['quality_score']:.2f}, Confidence={data['confidence']:.2f}")

quality_data = analyzer.plot_performance(performance_results)

print("\n✨ Analysis Insights:")
print("- Different strategies excel at different problem types")
print("- Balanced approach shows consistent performance")
print("- Conservative strategy has high confidence but may lack innovation")
print("- Aggressive strategy shows variable results but high potential")

## 4. ⚖️ Experiment 3: Research Applications

**Research Question:** What are the real-world applications of self-reflecting AI?

This section demonstrates four practical applications with concise, clear output.

In [None]:
# Research Applications (Streamlined)
class ApplicationDemo:
    def __init__(self):
        self.applications = {
            'code_review': self._code_review_agent,
            'research_assistant': self._research_assistant,
            'system_design': self._system_design_helper,
            'debugging_helper': self._debugging_assistant
        }
    
    def _code_review_agent(self, code_snippet):
        """AI code reviewer with self-reflection"""
        review = f"Code Review for: {code_snippet[:50]}...\n"
        review += "Issues found: Style inconsistencies, missing error handling\n"
        review += "Suggestions: Add comments, implement try-catch blocks\n"
        review += "Reflection: This review covers basic issues but should also check performance and security"
        return review
    
    def _research_assistant(self, research_topic):
        """AI research assistant with critical thinking"""
        analysis = f"Research Analysis: {research_topic}\n"
        analysis += "Key areas to explore: Literature review, methodology, current gaps\n"
        analysis += "Critical questions: What assumptions are being made? What evidence supports this?\n"
        analysis += "Self-critique: Need to consider bias in sources and alternative perspectives"
        return analysis
    
    def _system_design_helper(self, system_requirements):
        """AI system architect with validation"""
        design = f"System Design for: {system_requirements}\n"
        design += "Architecture: Microservices with API gateway, database clustering\n"
        design += "Validation: Load testing, security audit, scalability analysis\n"
        design += "Reflection: Design looks solid but should consider disaster recovery scenarios"
        return design
    
    def _debugging_assistant(self, bug_description):
        """AI debugging helper with systematic approach"""
        debug_plan = f"Debug Strategy for: {bug_description}\n"
        debug_plan += "Steps: 1) Reproduce issue, 2) Check logs, 3) Test hypotheses\n"
        debug_plan += "Tools: Debugger, profiler, logging framework\n"
        debug_plan += "Self-check: Am I testing the right scenarios? Are there edge cases missed?"
        return debug_plan
    
    def demonstrate_application(self, app_type, input_data):
        """Demonstrate a specific application"""
        if app_type in self.applications:
            result = self.applications[app_type](input_data)
            return result
        return "Application not available"

# Demo the applications
demo = ApplicationDemo()

print("🛠️ Real-World Applications Demo:")
print("="*50)

# Code Review Demo
print("\n1. 📝 Code Review Assistant:")
code_sample = "def calculate_fibonacci(n): return n if n <= 1 else calculate_fibonacci(n-1) + calculate_fibonacci(n-2)"
code_review_result = demo.demonstrate_application('code_review', code_sample)
print(code_review_result)

# Research Assistant Demo
print("\n2. 🔬 Research Assistant:")
research_topic = "The effectiveness of self-reflecting AI agents in educational software"
research_result = demo.demonstrate_application('research_assistant', research_topic)
print(research_result)

# System Design Demo
print("\n3. 🏢 System Design Helper:")
system_req = "E-commerce platform handling 1M+ users with real-time inventory"
design_result = demo.demonstrate_application('system_design', system_req)
print(design_result)

# Debugging Demo
print("\n4. 🐛 Debugging Assistant:")
bug_desc = "API response time increased 300% after latest deployment"
debug_result = demo.demonstrate_application('debugging_helper', bug_desc)
print(debug_result)

print("\n✨ Applications Summary:")
print("• Code Review: Catches issues humans might miss")
print("• Research: Provides systematic analysis framework")
print("• System Design: Validates architecture decisions")
print("• Debugging: Offers structured problem-solving approach")

## 5. 🎯 Your Turn: Custom Research Experiment

Try your own research experiment! Edit the variables below, run the cell, and inspect the results and trace. The trace is shown in a collapsible block for clarity.

In [None]:
# Custom Research Experiment (Streamlined)
# Edit these variables to try your own experiment!
custom_problem = "How can AI help with climate change?"  # Change this!
custom_strategy = 'balanced'  # Try: 'conservative', 'aggressive', 'balanced'
custom_iterations = 3  # Adjust 1-5

print("🎯 Your Custom Research Experiment:")
print(f"Problem: {custom_problem}")
print(f"Strategy: {custom_strategy}")

trace = []
def capture_trace(problem, strategy, max_iterations):
    responses = []
    for i in range(max_iterations):
        response, confidence = research_agent.generate_with_confidence(problem, strategy)
        evaluations = research_agent.multi_perspective_evaluate(response)
        avg_score = np.mean([eval_data['score'] for eval_data in evaluations.values()])
        responses.append({
            'iteration': i+1,
            'response': response,
            'confidence': confidence,
            'avg_score': avg_score,
            'evaluations': evaluations
        })
        if avg_score > 0.85 and confidence > 0.9:
            break
    return responses, responses[-1]['response'] if responses else "No solution generated"

trace, custom_result = capture_trace(custom_problem, custom_strategy, custom_iterations)

print("\n🎯 Final Result:")
print(custom_result)

print(f"\n📊 Custom Experiment Analysis:")
print(f"• Iterations: {len(trace)}")
print(f"• Final confidence: {trace[-1]['confidence']:.2f}")
print(f"• Final avg score: {trace[-1]['avg_score']:.2f}")

trace_md = """<details><summary>🔎 Show Full Reflection Trace</summary>\n"""
for step in trace:
    trace_md += f"\n<b>Iteration {step['iteration']}</b><br>"
    trace_md += f"<b>Response:</b> {step['response']}<br>"
    trace_md += f"<b>Confidence:</b> {step['confidence']:.2f}<br>"
    trace_md += f"<b>Avg Score:</b> {step['avg_score']:.2f}<br>"
    trace_md += "<b>Evaluations:</b><ul>"
    for k, v in step['evaluations'].items():
        trace_md += f"<li>{k}: {v['score']:.2f} ({v['feedback']})</li>"
    trace_md += "</ul><hr>"
trace_md += "</details>"
display(Markdown(trace_md))

print("\n✨ Try These Modifications:")
print("1. Change the custom_problem to something you're curious about")
print("2. Try different strategy combinations")
print("3. Adjust custom_iterations (1-5)")

## 6. 🎆 Research Conclusions & Your Next Steps

### What We've Discovered
- **Meta-reflection works**: AI can improve by analyzing its own processes
- **Strategy matters**: Different approaches work for different problems
- **Multi-perspective evaluation**: Catches more issues than single-criteria review
- **Real applications exist**: Code review, research, design, debugging

### Your Research Journey
- **Join AI research projects**: Work on real problems with faculty
- **Read cutting-edge papers**: ArXiv, NeurIPS, ICML conferences
- **Build your own applications**: Implement these ideas in real projects
- **Contribute to open source**: LangChain, AutoGPT, research frameworks

### Recommended Next Reading
- "Constitutional AI" (Anthropic) - AI with embedded principles
- "Self-Refine" (Madaan et al.) - Iterative improvement techniques
- "Reflexion" (Shinn et al.) - Learning from failures
- "Chain-of-Thought" (Wei et al.) - Reasoning in language models

### Research Project Ideas
- **Domain-specific agents**: Build agents for specific CS fields
- **Evaluation metrics**: Develop better ways to measure reflection quality
- **Human-AI collaboration**: Combine human insight with AI reflection
- **Efficiency optimization**: Reduce computational cost of reflection

### 🌟 Join the UofT AI Agents Club!
This workshop is just the beginning. The club works on:
- Research projects with industry partners
- AI competitions and hackathons
- Study groups for cutting-edge papers
- Startup opportunities in AI agents space

**Ready to push the boundaries of AI? The future needs researchers like you!** 🧠✨

---

*"The best way to predict the future is to invent it." - Alan Kay*