# üî¨ Advanced Experiments
**UofT AI Agents Club - Advanced Track**

Now that you've built a basic self-reflecting agent, let's push it further!

## üéØ What We'll Do
1. **Smarter Agents** - More sophisticated reflection strategies
2. **Domain Tests** - CS, ML, Systems problems
3. **Performance** - Measure improvement quality
4. **Your Turn** - Experiment with your own ideas

**Prerequisites**: Complete [`workshop_tutorial.ipynb`](workshop_tutorial.ipynb) first

Let's experiment! üß™

In [6]:
import sys
import os
import random

# Add the parent directory to sys.path so 'src' can be imported
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))

print("üî¨ Advanced Experiments Setup")
print("‚úÖ All modules imported successfully!")
print("üß™ Ready for advanced experimentation!")

# Advanced Self-Reflecting Agent
class AdvancedReflectiveAgent:
    def __init__(self):
        self.strategies = {
            'technical': self._technical_generate,
            'creative': self._creative_generate,
            'systematic': self._systematic_generate
        }
        self.evaluation_modes = {
            'critical': self._critical_evaluate,
            'constructive': self._constructive_evaluate,
            'comprehensive': self._comprehensive_evaluate
        }
    
    def _technical_generate(self, problem):
        """Generate technical, detailed responses"""
        if 'recursive' in problem.lower():
            return "For recursive optimization: 1) Add memoization to cache results, 2) Consider iterative alternatives, 3) Analyze time complexity O(n) vs O(2^n), 4) Use dynamic programming if overlapping subproblems exist."
        elif 'algorithm' in problem.lower():
            return "Algorithm optimization approach: 1) Profile bottlenecks, 2) Choose optimal data structures, 3) Reduce algorithmic complexity, 4) Consider space-time tradeoffs."
        elif 'system' in problem.lower():
            return "System design considerations: 1) Scalability patterns, 2) Database optimization, 3) Caching strategies, 4) Load balancing, 5) Monitoring and metrics."
        return "Technical analysis: Break down the problem, identify core constraints, design solution architecture, implement incrementally."
    
    def _creative_generate(self, problem):
        """Generate creative, out-of-the-box responses"""
        return f"Creative approach to '{problem}': Think beyond conventional solutions. What if we approached this from a completely different angle? Consider analogies from other domains, unconventional data structures, or novel algorithmic paradigms."
    
    def _systematic_generate(self, problem):
        """Generate systematic, methodical responses"""
        return f"Systematic solution for '{problem}': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics."
    
    def _critical_evaluate(self, response):
        """Critical evaluation - finds flaws and weaknesses"""
        issues = []
        if len(response) < 50:
            issues.append("Response lacks sufficient detail and depth")
        if 'step' not in response.lower() and '1)' not in response:
            issues.append("Missing clear step-by-step breakdown")
        if not any(word in response.lower() for word in ['complexity', 'performance', 'efficiency', 'optimize']):
            issues.append("Lacks discussion of performance implications")
        return issues if issues else ["Solid technical response"]
    
    def _constructive_evaluate(self, response):
        """Constructive evaluation - suggests improvements"""
        suggestions = []
        if 'example' not in response.lower():
            suggestions.append("Could benefit from concrete examples")
        if 'trade' not in response.lower():
            suggestions.append("Should discuss trade-offs and alternatives")
        if len(response.split('.')) < 3:
            suggestions.append("Could expand with more detailed explanation")
        return suggestions if suggestions else ["Well-structured response"]
    
    def _comprehensive_evaluate(self, response):
        """Comprehensive evaluation - checks multiple aspects"""
        return self._critical_evaluate(response) + self._constructive_evaluate(response)
    
    def refine(self, response, critiques):
        """Refine response based on critiques"""
        if not critiques or 'Solid' in str(critiques) or 'Well-structured' in str(critiques):
            return response
        
        refined = response
        
        if 'lacks sufficient detail' in str(critiques):
            refined += " Additionally, consider implementation details, edge cases, and validation strategies."
        
        if 'step-by-step' in str(critiques):
            refined = f"Step-by-step approach: {refined}"
        
        if 'performance implications' in str(critiques):
            refined += " Important: Analyze time and space complexity, benchmark performance, and optimize bottlenecks."
        
        if 'concrete examples' in str(critiques):
            refined += " For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤)."
        
        if 'trade-offs' in str(critiques):
            refined += " Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization."
        
        return refined
    
    def solve(self, problem, generation_strategy='technical', evaluation_mode='critical', max_iterations=3):
        """Advanced solving with configurable strategies"""
        print(f"üî¨ Problem: {problem}")
        print(f"‚öôÔ∏è Strategy: {generation_strategy} | Evaluation: {evaluation_mode}")
        print("-" * 60)
        
        # Generate initial response
        response = self.strategies[generation_strategy](problem)
        
        for i in range(max_iterations):
            print(f"\nüîÑ Iteration {i+1}")
            print(f"üí° Response: {response}")
            
            # Evaluate
            critiques = self.evaluation_modes[evaluation_mode](response)
            print(f"üîç Critique: {critiques[0] if critiques else 'No issues found'}")
            
            # Check if we should stop
            if not critiques or 'Solid' in str(critiques) or 'Well-structured' in str(critiques):
                print("‚úÖ Response satisfactory!")
                break
            
            # Refine
            response = self.refine(response, critiques)
            print(f"‚ú® Refined: {response[:100]}..." if len(response) > 100 else f"‚ú® Refined: {response}")
        
        return response

# Simple metrics function
def calculate_metrics(response):
    """Calculate basic metrics for the response"""
    if not response:
        return {"length": 0, "words": 0, "sentences": 0}
    
    return {
        "length": len(response),
        "words": len(response.split()),
        "sentences": response.count('.') + response.count('!') + response.count('?'),
        "complexity_score": len([w for w in response.split() if len(w) > 6]) / len(response.split()) if response.split() else 0
    }

# Create advanced agent
advanced_agent = AdvancedReflectiveAgent()
print("üî¨ Advanced Self-Reflecting Agent ready!")
print("üé° Multiple strategies and evaluation modes available!")

üî¨ Advanced Experiments Setup
‚úÖ All modules imported successfully!
üß™ Ready for advanced experimentation!
üî¨ Advanced Self-Reflecting Agent ready!
üé° Multiple strategies and evaluation modes available!


## üß™ Experiment 1: Technical Problem Solving

In [7]:
# üß™ Experiment 1: Technical Problem Solving (Streamlined)
from IPython.display import display, Markdown
tech_problem = "How can I optimize the performance of a recursive algorithm?"

# Run the agent and capture the trace
trace = []
def capture_trace(problem, strategy, evaluation, max_iterations):
    responses = []
    response = advanced_agent.strategies[strategy](problem)
    for i in range(max_iterations):
        critiques = advanced_agent.evaluation_modes[evaluation](response)
        responses.append({
            'iteration': i+1,
            'response': response,
            'critiques': critiques
        })
        if not critiques or 'Solid' in str(critiques) or 'Well-structured' in str(critiques):
            break
        response = advanced_agent.refine(response, critiques)
    return responses, response

trace, result1 = capture_trace(tech_problem, 'technical', 'critical', 2)

# Show final result
print("\nüéØ Final Technical Result:")
print(result1)

# Show metrics
metrics = calculate_metrics(result1)
print(f"\nüìä Technical Strategy Metrics: {metrics}")

# Show full trace in collapsible Markdown
trace_md = """<details><summary>üîé Show Full Reflection Trace</summary>\n"""
for step in trace:
    trace_md += f"\n<b>Iteration {step['iteration']}</b><br>"
    trace_md += f"<b>Response:</b> {step['response']}<br>"
    if step['critiques']:
        trace_md += "<b>Critiques:</b><ul>"
        for c in step['critiques']:
            trace_md += f"<li>{c}</li>"
        trace_md += "</ul>"
    trace_md += "<hr>"
trace_md += "</details>"
display(Markdown(trace_md))


üéØ Final Technical Result:
For recursive optimization: 1) Add memoization to cache results, 2) Consider iterative alternatives, 3) Analyze time complexity O(n) vs O(2^n), 4) Use dynamic programming if overlapping subproblems exist.

üìä Technical Strategy Metrics: {'length': 204, 'words': 28, 'sentences': 1, 'complexity_score': 0.5}


<details><summary>üîé Show Full Reflection Trace</summary>

<b>Iteration 1</b><br><b>Response:</b> For recursive optimization: 1) Add memoization to cache results, 2) Consider iterative alternatives, 3) Analyze time complexity O(n) vs O(2^n), 4) Use dynamic programming if overlapping subproblems exist.<br><b>Critiques:</b><ul><li>Solid technical response</li></ul><hr></details>

## üåç Experiment 2: Domain Transfer

In [8]:
# üß™ Experiment 2: Domain Transfer (Streamlined)
from IPython.display import display, Markdown

domains = {
    "Algorithms": "What's the best sorting algorithm for large datasets?",
    "Systems": "How do I handle race conditions in concurrent programming?", 
    "Machine Learning": "How do I prevent overfitting in my neural network?",
    "Software Engineering": "What design patterns improve code maintainability?"
}

print("üåç Domain Transfer Experiment:")

results = {}
traces = {}
for domain, problem in domains.items():
    trace = []
    def capture_trace(problem, strategy, evaluation, max_iterations):
        responses = []
        response = advanced_agent.strategies[strategy](problem)
        for i in range(max_iterations):
            critiques = advanced_agent.evaluation_modes[evaluation](response)
            responses.append({
                'iteration': i+1,
                'response': response,
                'critiques': critiques
            })
            if not critiques or 'Solid' in str(critiques) or 'Well-structured' in str(critiques):
                break
            response = advanced_agent.refine(response, critiques)
        return responses, response
    trace, result = capture_trace(problem, 'systematic', 'constructive', 2)
    results[domain] = len(result)
    traces[domain] = trace
    print(f"\nüî¨ {domain} Final Result:")
    print(result)
    metrics = calculate_metrics(result)
    print(f"üìä {domain} Metrics: {metrics}")
    # Show full trace in collapsible Markdown
    trace_md = f"<details><summary>Show Full Trace ({domain})</summary>\n"
    for step in trace:
        trace_md += f"\n<b>Iteration {step['iteration']}</b><br>"
        trace_md += f"<b>Response:</b> {step['response']}<br>"
        if step['critiques']:
            trace_md += "<b>Critiques:</b><ul>"
            for c in step['critiques']:
                trace_md += f"<li>{c}</li>"
            trace_md += "</ul>"
        trace_md += "<hr>"
    trace_md += "</details>"
    display(Markdown(trace_md))

print("\nüìà Domain Performance Summary:")
for domain, length in results.items():
    print(f"‚Ä¢ {domain}: {length} chars")

üåç Domain Transfer Experiment:

üî¨ Algorithms Final Result:
Systematic solution for 'What's the best sorting algorithm for large datasets?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.
üìä Algorithms Metrics: {'length': 456, 'words': 67, 'sentences': 4, 'complexity_score': 0.417910447761194}


<details><summary>Show Full Trace (Algorithms)</summary>

<b>Iteration 1</b><br><b>Response:</b> Systematic solution for 'What's the best sorting algorithm for large datasets?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics.<br><b>Critiques:</b><ul><li>Could benefit from concrete examples</li><li>Should discuss trade-offs and alternatives</li><li>Could expand with more detailed explanation</li></ul><hr>
<b>Iteration 2</b><br><b>Response:</b> Systematic solution for 'What's the best sorting algorithm for large datasets?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.<br><b>Critiques:</b><ul><li>Well-structured response</li></ul><hr></details>


üî¨ Systems Final Result:
Systematic solution for 'How do I handle race conditions in concurrent programming?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.
üìä Systems Metrics: {'length': 461, 'words': 68, 'sentences': 4, 'complexity_score': 0.39705882352941174}


<details><summary>Show Full Trace (Systems)</summary>

<b>Iteration 1</b><br><b>Response:</b> Systematic solution for 'How do I handle race conditions in concurrent programming?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics.<br><b>Critiques:</b><ul><li>Could benefit from concrete examples</li><li>Should discuss trade-offs and alternatives</li><li>Could expand with more detailed explanation</li></ul><hr>
<b>Iteration 2</b><br><b>Response:</b> Systematic solution for 'How do I handle race conditions in concurrent programming?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.<br><b>Critiques:</b><ul><li>Well-structured response</li></ul><hr></details>


üî¨ Machine Learning Final Result:
Systematic solution for 'How do I prevent overfitting in my neural network?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.
üìä Machine Learning Metrics: {'length': 453, 'words': 68, 'sentences': 4, 'complexity_score': 0.39705882352941174}


<details><summary>Show Full Trace (Machine Learning)</summary>

<b>Iteration 1</b><br><b>Response:</b> Systematic solution for 'How do I prevent overfitting in my neural network?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics.<br><b>Critiques:</b><ul><li>Could benefit from concrete examples</li><li>Should discuss trade-offs and alternatives</li><li>Could expand with more detailed explanation</li></ul><hr>
<b>Iteration 2</b><br><b>Response:</b> Systematic solution for 'How do I prevent overfitting in my neural network?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.<br><b>Critiques:</b><ul><li>Well-structured response</li></ul><hr></details>


üî¨ Software Engineering Final Result:
Systematic solution for 'What design patterns improve code maintainability?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.
üìä Software Engineering Metrics: {'length': 453, 'words': 65, 'sentences': 4, 'complexity_score': 0.4153846153846154}


<details><summary>Show Full Trace (Software Engineering)</summary>

<b>Iteration 1</b><br><b>Response:</b> Systematic solution for 'What design patterns improve code maintainability?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics.<br><b>Critiques:</b><ul><li>Could benefit from concrete examples</li><li>Should discuss trade-offs and alternatives</li><li>Could expand with more detailed explanation</li></ul><hr>
<b>Iteration 2</b><br><b>Response:</b> Systematic solution for 'What design patterns improve code maintainability?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.<br><b>Critiques:</b><ul><li>Well-structured response</li></ul><hr></details>


üìà Domain Performance Summary:
‚Ä¢ Algorithms: 456 chars
‚Ä¢ Systems: 461 chars
‚Ä¢ Machine Learning: 453 chars
‚Ä¢ Software Engineering: 453 chars


## ‚öîÔ∏è Experiment 3: Strategy Showdown

In [9]:
# üß™ Experiment 3: Strategy Showdown (Streamlined)
from IPython.display import display, Markdown
test_problem = "How do I debug a complex software system?"
strategies = ['technical', 'creative', 'systematic']

print("‚öîÔ∏è Strategy Comparison:")

performance = {}
traces = {}
for strategy in strategies:
    trace = []
    def capture_trace(problem, strategy, evaluation, max_iterations):
        responses = []
        response = advanced_agent.strategies[strategy](problem)
        for i in range(max_iterations):
            critiques = advanced_agent.evaluation_modes['comprehensive'](response)
            responses.append({
                'iteration': i+1,
                'response': response,
                'critiques': critiques
            })
            if not critiques or 'Solid' in str(critiques) or 'Well-structured' in str(critiques):
                break
            response = advanced_agent.refine(response, critiques)
        return responses, response
    trace, result = capture_trace(test_problem, strategy, 'comprehensive', 1)
    performance[strategy] = len(result)
    traces[strategy] = trace
    print(f"\nüõ†Ô∏è {strategy.title()} Final Result:")
    print(result)
    print(f"‚Üí {strategy}: {len(result)} characters")
    # Show full trace in collapsible Markdown
    trace_md = f"<details><summary>Show Full Trace ({strategy.title()})</summary>\n"
    for step in trace:
        trace_md += f"\n<b>Iteration {step['iteration']}</b><br>"
        trace_md += f"<b>Response:</b> {step['response']}<br>"
        if step['critiques']:
            trace_md += "<b>Critiques:</b><ul>"
            for c in step['critiques']:
                trace_md += f"<li>{c}</li>"
            trace_md += "</ul>"
        trace_md += "<hr>"
    trace_md += "</details>"
    display(Markdown(trace_md))

print("\nüèÜ Strategy Performance:")
sorted_results = sorted(performance.items(), key=lambda x: x[1], reverse=True)
for i, (strategy, score) in enumerate(sorted_results, 1):
    medal = "ü•á" if i == 1 else "ü•à" if i == 2 else "ü•â"
    print(f"{medal} {strategy}: {score} chars")

‚öîÔ∏è Strategy Comparison:

üõ†Ô∏è Technical Final Result:
System design considerations: 1) Scalability patterns, 2) Database optimization, 3) Caching strategies, 4) Load balancing, 5) Monitoring and metrics. Important: Analyze time and space complexity, benchmark performance, and optimize bottlenecks. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.
‚Üí technical: 456 characters


<details><summary>Show Full Trace (Technical)</summary>

<b>Iteration 1</b><br><b>Response:</b> System design considerations: 1) Scalability patterns, 2) Database optimization, 3) Caching strategies, 4) Load balancing, 5) Monitoring and metrics.<br><b>Critiques:</b><ul><li>Lacks discussion of performance implications</li><li>Could benefit from concrete examples</li><li>Should discuss trade-offs and alternatives</li><li>Could expand with more detailed explanation</li></ul><hr></details>


üõ†Ô∏è Creative Final Result:
Step-by-step approach: Creative approach to 'How do I debug a complex software system?': Think beyond conventional solutions. What if we approached this from a completely different angle? Consider analogies from other domains, unconventional data structures, or novel algorithmic paradigms. Important: Analyze time and space complexity, benchmark performance, and optimize bottlenecks. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.
‚Üí creative: 597 characters


<details><summary>Show Full Trace (Creative)</summary>

<b>Iteration 1</b><br><b>Response:</b> Creative approach to 'How do I debug a complex software system?': Think beyond conventional solutions. What if we approached this from a completely different angle? Consider analogies from other domains, unconventional data structures, or novel algorithmic paradigms.<br><b>Critiques:</b><ul><li>Missing clear step-by-step breakdown</li><li>Lacks discussion of performance implications</li><li>Could benefit from concrete examples</li><li>Should discuss trade-offs and alternatives</li></ul><hr></details>


üõ†Ô∏è Systematic Final Result:
Systematic solution for 'How do I debug a complex software system?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics. Important: Analyze time and space complexity, benchmark performance, and optimize bottlenecks. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.
‚Üí systematic: 539 characters


<details><summary>Show Full Trace (Systematic)</summary>

<b>Iteration 1</b><br><b>Response:</b> Systematic solution for 'How do I debug a complex software system?': 1) Problem analysis and requirements, 2) Research existing solutions, 3) Design multiple approaches, 4) Prototype and test, 5) Iterate and refine based on metrics.<br><b>Critiques:</b><ul><li>Lacks discussion of performance implications</li><li>Could benefit from concrete examples</li><li>Should discuss trade-offs and alternatives</li><li>Could expand with more detailed explanation</li></ul><hr></details>


üèÜ Strategy Performance:
ü•á creative: 597 chars
ü•à systematic: 539 chars
ü•â technical: 456 chars


## üéØ Your Turn: Custom Experiment

In [10]:
# üéØ Your Turn: Custom Experiment (Streamlined)
from IPython.display import display, Markdown

# Edit these variables to try your own experiment!
custom_problem = "How do I learn machine learning effectively?"  # Change this!
custom_strategy = 'creative'  # Try: 'technical', 'creative', 'systematic'
custom_evaluation = 'constructive'  # Try: 'critical', 'constructive', 'comprehensive'
custom_iterations = 3  # Adjust 1-5

print("üéØ Your Custom Experiment:")
print(f"Problem: {custom_problem}")
print(f"Strategy: {custom_strategy} | Evaluation: {custom_evaluation}")

# Run the agent and capture the trace
trace = []
def capture_trace(problem, strategy, evaluation, max_iterations):
    responses = []
    response = advanced_agent.strategies[strategy](problem)
    for i in range(max_iterations):
        critiques = advanced_agent.evaluation_modes[evaluation](response)
        responses.append({
            'iteration': i+1,
            'response': response,
            'critiques': critiques
        })
        if not critiques or 'Solid' in str(critiques) or 'Well-structured' in str(critiques):
            break
        response = advanced_agent.refine(response, critiques)
    return responses, response

trace, custom_result = capture_trace(custom_problem, custom_strategy, custom_evaluation, custom_iterations)

# Show final result
print("\nüéØ Final Result:")
print(custom_result)

# Show metrics
custom_metrics = calculate_metrics(custom_result)
print(f"\nüìä Your Result Analysis:")
print(f"‚Ä¢ Length: {len(custom_result)} characters")
print(f"‚Ä¢ Words: {len(custom_result.split())} words")
print(f"‚Ä¢ Strategy used: {custom_strategy}")
print(f"‚Ä¢ Evaluation mode: {custom_evaluation}")
print(f"‚Ä¢ Complexity score: {custom_metrics['complexity_score']:.2f}")

# Show full trace in collapsible Markdown
trace_md = """<details><summary>üîé Show Full Reflection Trace</summary>\n"""
for step in trace:
    trace_md += f"\n<b>Iteration {step['iteration']}</b><br>"
    trace_md += f"<b>Response:</b> {step['response']}<br>"
    if step['critiques']:
        trace_md += "<b>Critiques:</b><ul>"
        for c in step['critiques']:
            trace_md += f"<li>{c}</li>"
        trace_md += "</ul>"
    trace_md += "<hr>"
trace_md += "</details>"
display(Markdown(trace_md))

print("\n‚ú® Try These Modifications:")
print("1. Change the custom_problem to something you're curious about")
print("2. Try different strategy combinations")
print("3. Adjust custom_iterations (1-5)")
print("4. Compare results with different evaluation modes")

üéØ Your Custom Experiment:
Problem: How do I learn machine learning effectively?
Strategy: creative | Evaluation: constructive

üéØ Final Result:
Creative approach to 'How do I learn machine learning effectively?': Think beyond conventional solutions. What if we approached this from a completely different angle? Consider analogies from other domains, unconventional data structures, or novel algorithmic paradigms. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.

üìä Your Result Analysis:
‚Ä¢ Length: 482 characters
‚Ä¢ Words: 68 words
‚Ä¢ Strategy used: creative
‚Ä¢ Evaluation mode: constructive
‚Ä¢ Complexity score: 0.41


<details><summary>üîé Show Full Reflection Trace</summary>

<b>Iteration 1</b><br><b>Response:</b> Creative approach to 'How do I learn machine learning effectively?': Think beyond conventional solutions. What if we approached this from a completely different angle? Consider analogies from other domains, unconventional data structures, or novel algorithmic paradigms.<br><b>Critiques:</b><ul><li>Could benefit from concrete examples</li><li>Should discuss trade-offs and alternatives</li></ul><hr>
<b>Iteration 2</b><br><b>Response:</b> Creative approach to 'How do I learn machine learning effectively?': Think beyond conventional solutions. What if we approached this from a completely different angle? Consider analogies from other domains, unconventional data structures, or novel algorithmic paradigms. For example, in a sorting algorithm, this could mean choosing quicksort O(n log n) over bubble sort O(n¬≤). Key trade-offs include memory vs speed, simplicity vs performance, and development time vs optimization.<br><b>Critiques:</b><ul><li>Well-structured response</li></ul><hr></details>


‚ú® Try These Modifications:
1. Change the custom_problem to something you're curious about
2. Try different strategy combinations
3. Adjust custom_iterations (1-5)
4. Compare results with different evaluation modes


## üéâ Experiment Results & Next Steps

Congratulations! You've completed advanced experiments with self-reflecting AI agents! üß†‚ú®

### üìä What You've Discovered:
- How agents handle different problem domains
- The impact of different strategies on response quality
- Performance differences across problem types
- The agent's step-by-step improvement process

### üöÄ What's Next?

1. **Deep Dive**: Explore [`research_extensions.ipynb`](research_extensions.ipynb)
   - Meta-reflection (agents critiquing their own critique process)
   - Multi-agent collaboration
   - Advanced research techniques

2. **Build Your Own**: Create specialized agents
   - Code review assistants
   - Writing improvement tools
   - Domain-specific problem solvers

3. **Join the Club**: Continue learning with UofT AI Agents Club
   - Weekly workshops
   - Research projects
   - Industry connections

### üîç Key Insights from Your Experiments:
- **Strategy Selection**: Different problems benefit from different approaches
- **Evaluation Modes**: Critical vs constructive evaluation serves different purposes
- **Domain Adaptation**: Agents can adapt to various CS fields
- **Self-Improvement**: Agents genuinely improve their reasoning through reflection

**Keep experimenting and pushing the boundaries! üåü**