# Syllabus Generation with Pre-Knowledge Assessment

This notebook implements an intelligent syllabus generation system with:
- **Pre-knowledge assessment**: MCQ-based evaluation of user's existing knowledge
- **Adaptive customization**: Syllabus tailored to user's knowledge level
- User preference collection (6 learning style options)
- Detailed syllabus generation with modules and topics
- Two-layer validation system (Generator + Reviewer)
- Iterative improvement with feedback loops
- Hallucination prevention through detailed descriptions

## Workflow:
1. Load or create a course
2. Generate MCQ questions to assess pre-knowledge
3. User takes the assessment (with "I don't know" option)
4. System validates answers locally
5. Syllabus is generated based on knowledge level + learning preference

## Install Required Packages

In [1]:
%pip install langchain langchain-ollama pydantic python-dotenv -q

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## Import Dependencies

In [2]:
from typing import List
from pydantic import BaseModel, Field, field_validator
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.exceptions import OutputParserException
from langchain_core.output_parsers import PydanticOutputParser
import os
import pickle
import json

  from .autonotebook import tqdm as notebook_tqdm


## Define Course Model

We need the Course model to work with course data. This is the same as in the course planning notebook.

In [3]:
class Course(BaseModel):
    """Represents a single course in the learning path."""
    course_name: str = Field(..., description="Name of the course")
    description: str = Field(..., description="Detailed description of the course")
    difficulty: str = Field(..., description="Difficulty level: Beginner, Intermediate, or Advanced")
    prerequisites: List[str] = Field(default_factory=list, description="List of prerequisite courses")
    
    @field_validator('difficulty')
    @classmethod
    def validate_difficulty(cls, v: str) -> str:
        """Ensure difficulty is one of the allowed values."""
        allowed = ['Beginner', 'Intermediate', 'Advanced']
        if v not in allowed:
            raise ValueError(f'Difficulty must be one of {allowed}')
        return v

## Define Syllabus Models

These models represent the detailed structure of a course syllabus.

In [4]:
class Topic(BaseModel):
    """Represents a single topic within a module."""
    topic_name: str = Field(..., description="Name of the topic")
    short_description: str = Field(..., description="Short description for users (1-2 sentences)")
    detailed_description: str = Field(..., description="Detailed description for AI to understand the scope and content")
    estimated_duration_minutes: int = Field(..., description="Estimated time to complete this topic in minutes", gt=0)


class Module(BaseModel):
    """Represents a module containing multiple topics."""
    module_name: str = Field(..., description="Name of the module")
    module_description: str = Field(..., description="Overview of what this module covers")
    topics: List[Topic] = Field(..., description="List of topics in this module")
    
    @field_validator('topics')
    @classmethod
    def validate_topics(cls, v: List[Topic]) -> List[Topic]:
        """Ensure at least one topic per module."""
        if len(v) < 1:
            raise ValueError('Each module must have at least 1 topic')
        return v


class Syllabus(BaseModel):
    """Represents a complete course syllabus."""
    course_name: str = Field(..., description="Name of the course")
    course_objective: str = Field(..., description="Main objective of the course")
    learning_style: str = Field(..., description="Learning style preferences applied")
    total_modules: int = Field(..., description="Total number of modules")
    modules: List[Module] = Field(..., description="List of modules in the syllabus")
    total_estimated_hours: float = Field(..., description="Total estimated hours to complete the course")
    
    @field_validator('total_modules')
    @classmethod
    def validate_total_modules(cls, v: int, info) -> int:
        """Ensure total_modules matches the length of modules list."""
        if 'modules' in info.data:
            actual_count = len(info.data['modules'])
            if v != actual_count:
                raise ValueError(f'total_modules ({v}) must equal the number of modules ({actual_count})')
        return v
    
    @field_validator('modules')
    @classmethod
    def validate_modules(cls, v: List[Module]) -> List[Module]:
        """Ensure at least one module."""
        if len(v) < 1:
            raise ValueError('Syllabus must have at least 1 module')
        return v


class SyllabusReview(BaseModel):
    """Represents a review of a syllabus by the validation layer."""
    approved: bool = Field(..., description="Whether the syllabus is approved")
    feedback: str = Field(..., description="Feedback or reasons for rejection/approval")
    issues: List[str] = Field(default_factory=list, description="List of specific issues found (if rejected)")

## Pre-Knowledge Assessment Models

These models handle the MCQ-based assessment of user's existing knowledge.

In [5]:
class Question(BaseModel):
    """Represents a single MCQ question with answer."""
    question_text: str = Field(..., description="The question to ask")
    topic: str = Field(..., description="The specific topic this question tests (e.g., 'Python basics', 'Data cleaning', 'Statistics')")
    options: List[str] = Field(..., description="List of 4 answer options (including 'I don't know about this course')")
    correct_answer_index: int = Field(..., description="Index of the correct answer (0-3)", ge=0, lt=4)
    explanation: str = Field(..., description="Explanation of the correct answer")
    difficulty_hint: str = Field(..., description="What level this question tests: Beginner, Intermediate, or Advanced")
    
    @field_validator('options')
    @classmethod
    def validate_options(cls, v: List[str]) -> List[str]:
        """Ensure exactly 4 options with 'I don't know' as last option."""
        dont_know_text = "I don't know about this course"
        
        # Check if we have exactly 4 options
        if len(v) < 3 or len(v) > 4:
            raise ValueError(f'Must have 3-4 options, got {len(v)}')
        
        # Find and remove "I don't know" if it exists
        options_without_dont_know = [opt for opt in v if opt != dont_know_text]
        
        # Ensure we have exactly 3 regular options
        if len(options_without_dont_know) > 3:
            options_without_dont_know = options_without_dont_know[:3]
        
        # Add "I don't know" as the last option
        return options_without_dont_know + [dont_know_text]


class QuestionSet(BaseModel):
    """Represents a set of assessment questions for a course."""
    course_name: str = Field(..., description="Name of the course being assessed")
    questions: List[Question] = Field(..., description="List of assessment questions")
    
    @field_validator('questions')
    @classmethod
    def validate_questions(cls, v: List[Question]) -> List[Question]:
        """Ensure at least 5 questions."""
        if len(v) < 5:
            raise ValueError('Must have at least 5 questions')
        return v


class AssessmentResult(BaseModel):
    """Represents the result of a pre-knowledge assessment."""
    course_name: str = Field(..., description="Name of the course")
    total_questions: int = Field(..., description="Total number of questions")
    correct_answers: int = Field(..., description="Number of correct answers")
    incorrect_answers: int = Field(..., description="Number of incorrect answers")
    dont_know_answers: int = Field(..., description="Number of 'I don't know' responses")
    knowledge_percentage: float = Field(..., description="Percentage of correct answers (0-100)")
    knowledge_level: str = Field(..., description="Assessed knowledge level: None, Basic, Intermediate, or Advanced")
    known_topics: List[str] = Field(default_factory=list, description="Topics the user answered correctly")
    weak_topics: List[str] = Field(default_factory=list, description="Topics the user answered incorrectly")
    unknown_topics: List[str] = Field(default_factory=list, description="Topics the user selected 'I don't know' for")
    
    @field_validator('knowledge_level')
    @classmethod
    def validate_knowledge_level(cls, v: str) -> str:
        """Ensure knowledge level is valid."""
        allowed = ['None', 'Basic', 'Intermediate', 'Advanced']
        if v not in allowed:
            raise ValueError(f'Knowledge level must be one of {allowed}')
        return v

## User Preference System

Handles learning style preferences that customize the syllabus generation.

In [6]:
class UserPreferences:
    """Handles user preference collection and system prompt generation."""
    
    PREFERENCE_OPTIONS = {
        "1": {
            "name": "Real-world Examples",
            "description": "Course filled with practical, real-world examples and case studies",
            "prompt_addition": "Include extensive real-world examples, case studies, and practical applications for every concept. Show how each topic is used in industry."
        },
        "2": {
            "name": "Theory-Focused",
            "description": "Deep theoretical understanding with mathematical foundations",
            "prompt_addition": "Focus on theoretical foundations, mathematical principles, and academic rigor. Include proofs and formal definitions where applicable."
        },
        "3": {
            "name": "Project-Based",
            "description": "Learning through building projects and practical exercises",
            "prompt_addition": "Structure each module around hands-on projects and exercises. Ensure learners build something tangible in each module."
        },
        "4": {
            "name": "Quick & Practical",
            "description": "Fast-paced, focused on immediate practical skills",
            "prompt_addition": "Keep content concise and practical. Focus on essential skills needed to get started quickly. Minimize theory, maximize practical application."
        },
        "5": {
            "name": "Comprehensive & Deep",
            "description": "Thorough coverage with both theory and practice",
            "prompt_addition": "Provide comprehensive coverage balancing theory and practice. Include edge cases, best practices, and deep dives into complex topics."
        },
        "6": {
            "name": "Beginner-Friendly",
            "description": "Slow-paced with lots of explanations and support",
            "prompt_addition": "Use simple language, provide detailed explanations, include many examples, and assume no prior knowledge. Break down complex concepts into digestible parts."
        }
    }
    
    @staticmethod
    def display_options():
        """Display available learning style options."""
        print("=" * 80)
        print("SELECT YOUR LEARNING STYLE PREFERENCE")
        print("=" * 80)
        for key, value in UserPreferences.PREFERENCE_OPTIONS.items():
            print(f"{key}. {value['name']}")
            print(f"   {value['description']}")
            print()
    
    @staticmethod
    def get_user_preference(auto_select: str = None) -> dict:
        """
        Get user preference either interactively or auto-select.
        
        Args:
            auto_select: If provided, auto-select this option (for notebook automation)
            
        Returns:
            Dictionary containing preference details
        """
        if auto_select:
            choice = auto_select
        else:
            UserPreferences.display_options()
            choice = input("Enter your choice (1-6): ").strip()
        
        if choice in UserPreferences.PREFERENCE_OPTIONS:
            selected = UserPreferences.PREFERENCE_OPTIONS[choice]
            print(f"\n‚úì Selected: {selected['name']}")
            return selected
        else:
            print(f"\n‚ö† Invalid choice. Defaulting to 'Real-world Examples'")
            return UserPreferences.PREFERENCE_OPTIONS["1"]

## Pre-Knowledge Assessment System

This system generates MCQ questions to evaluate user's existing knowledge of the course topic.

In [7]:
class PreKnowledgeAssessment:
    """Generates and administers MCQ-based pre-knowledge assessments."""
    
    def __init__(self, model_name: str = "llama3:8b", temperature: float = 0.7):
        """Initialize the assessment generator."""
        self.model_name = model_name
        self.llm = ChatOllama(model=model_name, temperature=temperature)
        self.structured_llm = self.llm.with_structured_output(QuestionSet)
    
    def generate_questions(self, course: Course, num_questions: int = 8) -> QuestionSet:
        """
        Generate MCQ questions to assess pre-knowledge of a course.
        
        Args:
            course: Course object to generate questions for
            num_questions: Number of questions to generate (default 8)
            
        Returns:
            QuestionSet with generated questions
        """
        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are an expert educator creating pre-assessment questions.

TASK: Generate {num_questions} multiple-choice questions to evaluate a student's existing knowledge about this course BEFORE they start learning.

CRITICAL REQUIREMENTS FOR OPTIONS:
1. Each question MUST have EXACTLY 4 options in this order:
   - Option at index 0: A plausible answer (may be correct or incorrect)
   - Option at index 1: A plausible answer (may be correct or incorrect)  
   - Option at index 2: A plausible answer (may be correct or incorrect)
   - Option at index 3: MUST BE EXACTLY this text: "I don't know about this course"

2. ONE of the first 3 options (index 0, 1, or 2) must be the correct answer
3. The correct_answer_index must be 0, 1, or 2 (NOT 3, since option 3 is always "I don't know")
4. DO NOT put "I don't know about this course" anywhere except as the 4th option

TOPIC FIELD REQUIREMENTS:
- Each question must have a "topic" field identifying the specific concept being tested
- Topics should be concise (2-5 words) and specific (e.g., "Python data types", "Data cleaning methods", "Statistical distributions")
- Topics help identify knowledge gaps so the syllabus can focus on areas needing attention
- Different questions can test different aspects of the same topic

QUESTION DISTRIBUTION:
- 3 Beginner-level questions (basic concepts, terminology)
- 3 Intermediate-level questions (understanding, application)
- 2 Advanced-level questions (analysis, complex scenarios)

QUESTION DESIGN:
- Questions should test EXISTING knowledge, not learning ability
- Include a mix of:
  * Conceptual understanding
  * Terminology definitions
  * Practical applications
  * Problem-solving scenarios
- Ensure incorrect options are plausible but clearly wrong to someone who knows the topic
- Provide clear, educational explanations for correct answers

DIFFICULTY HINTS:
- Beginner: Tests basic awareness and terminology
- Intermediate: Tests understanding and application
- Advanced: Tests deep knowledge and complex scenarios

EXAMPLE FORMAT:
{{
  "question_text": "What is Python?",
  "topic": "Programming languages",
  "options": [
    "A programming language",
    "A type of snake",
    "A web browser",
    "I don't know about this course"
  ],
  "correct_answer_index": 0,
  "explanation": "Python is a high-level programming language...",
  "difficulty_hint": "Beginner"
}}

Generate questions that will accurately assess if the student already knows this material."""),
            ("user", """Generate {num_questions} pre-assessment questions for:

Course: {course_name}
Description: {course_description}
Difficulty Level: {difficulty}

Remember: 
- EXACTLY 4 options per question
- Last option (index 3) MUST be "I don't know about this course"
- Cover Beginner, Intermediate, and Advanced levels
- Questions test EXISTING knowledge before the course starts""")
        ])
        
        chain = prompt | self.structured_llm
        
        print(f"üéØ Generating {num_questions} pre-assessment questions...")
        question_set = chain.invoke({
            "num_questions": num_questions,
            "course_name": course.course_name,
            "course_description": course.description,
            "difficulty": course.difficulty
        })
        
        print(f"‚úì Generated {len(question_set.questions)} questions")
        return question_set
    
    def administer_assessment(self, question_set: QuestionSet, auto_answers: List[int] = None) -> AssessmentResult:
        """
        Administer the assessment and collect user responses.
        
        Args:
            question_set: Set of questions to ask
            auto_answers: If provided, use these answers (for automation)
            
        Returns:
            AssessmentResult with performance metrics
        """
        print("\n" + "=" * 80)
        print(f"PRE-KNOWLEDGE ASSESSMENT: {question_set.course_name}")
        print("=" * 80)
        print("This assessment evaluates your existing knowledge of the course topic.")
        print("Answer honestly - selecting 'I don't know' helps us customize your learning path!")
        print("=" * 80 + "\n")
        
        correct = 0
        incorrect = 0
        dont_know = 0
        user_answers = []
        known_topics = []
        weak_topics = []
        unknown_topics = []
        
        for idx, question in enumerate(question_set.questions, 1):
            print(f"\nQuestion {idx}/{len(question_set.questions)}")
            print(f"Topic: {question.topic}")
            print(f"Difficulty: {question.difficulty_hint}")
            print("-" * 80)
            print(f"{question.question_text}\n")
            
            for opt_idx, option in enumerate(question.options):
                print(f"  {opt_idx + 1}. {option}")
            
            if auto_answers:
                answer = auto_answers[idx - 1]
                print(f"\n[Auto-selected: {answer}]")
            else:
                while True:
                    try:
                        answer = int(input("\nYour answer (1-4): ").strip())
                        if 1 <= answer <= 4:
                            break
                        print("Please enter a number between 1 and 4")
                    except ValueError:
                        print("Please enter a valid number")
            
            user_answer_idx = answer - 1
            user_answers.append(user_answer_idx)
            
            # Check if it's "I don't know"
            if user_answer_idx == 3:
                dont_know += 1
                unknown_topics.append(question.topic)
                print("‚úì Recorded: I don't know")
            elif user_answer_idx == question.correct_answer_index:
                correct += 1
                known_topics.append(question.topic)
                print(f"‚úì Correct! {question.explanation}")
            else:
                incorrect += 1
                weak_topics.append(question.topic)
                print(f"‚úó Incorrect. {question.explanation}")
            
            print("-" * 80)
        
        # Calculate knowledge metrics
        total = len(question_set.questions)
        # Calculate percentage based on answered questions only (excluding "don't know")
        answered_questions = total - dont_know
        if answered_questions > 0:
            knowledge_pct = (correct / answered_questions) * 100
        else:
            knowledge_pct = 0.0
        
        # Determine knowledge level
        if dont_know >= total * 0.6 or knowledge_pct < 30:
            knowledge_level = "None"
        elif knowledge_pct < 50:
            knowledge_level = "Basic"
        elif knowledge_pct < 80:
            knowledge_level = "Intermediate"
        else:
            knowledge_level = "Advanced"
        
        result = AssessmentResult(
            course_name=question_set.course_name,
            total_questions=total,
            correct_answers=correct,
            incorrect_answers=incorrect,
            dont_know_answers=dont_know,
            knowledge_percentage=knowledge_pct,
            knowledge_level=knowledge_level,
            known_topics=known_topics,
            weak_topics=weak_topics,
            unknown_topics=unknown_topics
        )
        
        self.print_results(result)
        return result
    
    def print_results(self, result: AssessmentResult) -> None:
        """Pretty print assessment results."""
        print("\n" + "=" * 80)
        print("ASSESSMENT RESULTS")
        print("=" * 80)
        print(f"Course: {result.course_name}")
        print(f"Total Questions: {result.total_questions}")
        print(f"Correct Answers: {result.correct_answers}")
        print(f"Incorrect Answers: {result.incorrect_answers}")
        print(f"'I don't know' Responses: {result.dont_know_answers}")
        print(f"Knowledge Score: {result.knowledge_percentage:.1f}%")
        print(f"Assessed Knowledge Level: {result.knowledge_level}")
        print("=" * 80)
        
        # Show topic breakdown
        print("\nTOPIC BREAKDOWN:")
        if result.known_topics:
            print(f"‚úì Known Topics ({len(result.known_topics)}): {', '.join(result.known_topics)}")
        if result.weak_topics:
            print(f"‚ö† Weak Topics ({len(result.weak_topics)}): {', '.join(result.weak_topics)}")
        if result.unknown_topics:
            print(f"‚úó Unknown Topics ({len(result.unknown_topics)}): {', '.join(result.unknown_topics)}")
        print("=" * 80)
        
        # Provide interpretation
        print("\nINTERPRETATION:")
        if result.knowledge_level == "None":
            print("You have little to no existing knowledge of this topic.")
            print("‚Üí The syllabus will start from absolute basics with detailed explanations.")
        elif result.knowledge_level == "Basic":
            print("You have basic familiarity with some concepts.")
            print("‚Üí The syllabus will review fundamentals briefly and focus on building deeper understanding.")
        elif result.knowledge_level == "Intermediate":
            print("You have solid foundational knowledge.")
            print("‚Üí The syllabus will move quickly through basics and focus on intermediate/advanced topics.")
        else:
            print("You have advanced knowledge of this topic!")
            print("‚Üí The syllabus will focus on advanced concepts, edge cases, and mastery.")
        print("=" * 80)

## Custom Syllabus Parser

This custom parser filters out empty modules before Pydantic validation, preventing errors when the LLM generates incomplete modules.

## Two-Layer Syllabus Generation System

This is the core system that generates and validates syllabi using two independent LLM layers.

In [8]:
class CustomSyllabusParser(PydanticOutputParser):
    """Custom parser that filters empty modules before validation."""
    
    def parse(self, text: str) -> Syllabus:
        """Parse and clean the LLM output before Pydantic validation."""
        try:
            # Extract JSON from text (handles cases where LLM adds explanatory text)
            json_start = text.find('{')
            json_end = text.rfind('}') + 1
            if json_start == -1 or json_end == 0:
                raise ValueError("No JSON object found in response")
            
            json_text = text[json_start:json_end]
            data = json.loads(json_text)
            
            # Filter out empty modules
            if 'modules' in data:
                original_count = len(data['modules'])
                data['modules'] = [
                    m for m in data['modules']
                    if m.get('module_name', '').strip() != ''  # Has name
                    and m.get('module_description', '').strip() != ''  # Has description
                    and len(m.get('topics', [])) > 0  # Has topics
                ]
                filtered_count = len(data['modules'])
                
                if filtered_count < original_count:
                    print(f"  ‚öôÔ∏è  Filtered {original_count - filtered_count} empty module(s)")
                
                # Update total_modules to match filtered count
                data['total_modules'] = filtered_count
            
            # Now validate with Pydantic
            return Syllabus.model_validate(data)
            
        except json.JSONDecodeError as e:
            raise OutputParserException(f"Failed to parse JSON: {e}") from e
        except Exception as e:
            raise OutputParserException(f"Failed to parse syllabus: {e}") from e


class SyllabusGenerator:
    """Two-layer system for generating and validating course syllabi."""
    
    def __init__(self, model_name: str = "llama3:8b", temperature: float = 0.7):
        """Initialize the syllabus generator with two layers."""
        self.model_name = model_name
        self.temperature = temperature
        
        # Layer 1: Syllabus Generator with custom parser
        self.generator_llm = ChatOllama(model=model_name, temperature=temperature)
        self.syllabus_parser = CustomSyllabusParser(pydantic_object=Syllabus)
        
        # Layer 2: Syllabus Reviewer (lower temperature for consistency)
        self.reviewer_llm = ChatOllama(model=model_name, temperature=0.3)
        self.structured_reviewer = self.reviewer_llm.with_structured_output(SyllabusReview)
        
    def _create_generator_prompt(self, user_preference: dict, assessment_result: AssessmentResult = None) -> ChatPromptTemplate:
        """Create the prompt for syllabus generation (Layer 1)."""
        
        # Add knowledge level adaptation if assessment was taken
        knowledge_adaptation = ""
        if assessment_result:
            # Base knowledge level guidance
            if assessment_result.knowledge_level == "None":
                knowledge_adaptation = f"""\n\nSTUDENT PRE-KNOWLEDGE ASSESSMENT:
The student has MINIMAL existing knowledge of this topic (scored {assessment_result.knowledge_percentage:.1f}%).

ADAPTATION REQUIREMENTS:
- Start from absolute fundamentals - assume NO prior knowledge
- Include detailed explanations of basic terminology and concepts
- Use simple language and avoid jargon (or explain it when used)
- Provide more examples and practice exercises for foundational topics
- Allocate more time to introductory modules
- Build confidence gradually with achievable early milestones"""
            elif assessment_result.knowledge_level == "Basic":
                knowledge_adaptation = f"""\n\nSTUDENT PRE-KNOWLEDGE ASSESSMENT:
The student has BASIC familiarity with some concepts (scored {assessment_result.knowledge_percentage:.1f}%).

ADAPTATION REQUIREMENTS:
- Provide a brief review of fundamentals (don't spend too much time)
- Focus on filling knowledge gaps and building deeper understanding
- Move to intermediate concepts more quickly
- Include connections between concepts they know and new material
- Balance between review and new content (30% review, 70% new)"""
            elif assessment_result.knowledge_level == "Intermediate":
                knowledge_adaptation = f"""\n\nSTUDENT PRE-KNOWLEDGE ASSESSMENT:
The student has SOLID foundational knowledge (scored {assessment_result.knowledge_percentage:.1f}%).

ADAPTATION REQUIREMENTS:
- Skip or minimize basic concepts (brief refresher only if essential)
- Focus primarily on intermediate and advanced topics
- Include more complex scenarios and real-world applications
- Emphasize depth over breadth in advanced areas
- Balance: 10-20% fundamentals review, 80-90% advanced content"""
            else:  # Advanced
                knowledge_adaptation = f"""\n\nSTUDENT PRE-KNOWLEDGE ASSESSMENT:
The student has ADVANCED knowledge of this topic (scored {assessment_result.knowledge_percentage:.1f}%).

ADAPTATION REQUIREMENTS:
- Skip basic and intermediate topics entirely (unless essential for context)
- Focus on advanced concepts, edge cases, and mastery-level content
- Include cutting-edge developments and best practices
- Incorporate challenging projects and real-world problem-solving
- Emphasize expertise development and specialized applications
- Consider adding bonus/optional modules for extremely advanced topics"""
            
            # Add topic-level granularity
            topic_breakdown = ""
            if assessment_result.known_topics:
                topic_breakdown += f"\n\n‚úì KNOWN TOPICS - The student already understands these:\n  {', '.join(assessment_result.known_topics)}\n  ‚Üí Move quickly through these or skip detailed explanations"
            
            if assessment_result.weak_topics:
                topic_breakdown += f"\n\n‚ö† WEAK TOPICS - The student has partial knowledge but needs reinforcement:\n  {', '.join(assessment_result.weak_topics)}\n  ‚Üí Provide focused practice, examples, and review for these areas"
            
            if assessment_result.unknown_topics:
                topic_breakdown += f"\n\n‚úó UNKNOWN TOPICS - The student has no knowledge of these:\n  {', '.join(assessment_result.unknown_topics)}\n  ‚Üí Allocate extra time and detailed explanations for these topics"
            
            if topic_breakdown:
                knowledge_adaptation += f"\n\nTOPIC-LEVEL KNOWLEDGE GAPS:{topic_breakdown}\n\nIMPORTANT: Tailor the syllabus to spend appropriate time on each topic category. Prioritize unknown and weak topics in your content allocation."
        
        system_message = f"""You are an expert curriculum designer creating a detailed course syllabus.

LEARNING STYLE PREFERENCE:
{user_preference['prompt_addition']}{knowledge_adaptation}

SYLLABUS STRUCTURE REQUIREMENTS:
1. Create between 3 and 8 modules that comprehensively cover the course (choose appropriate number based on course scope)
2. Each module MUST have 3-10 topics
3. CRITICAL: DO NOT generate empty modules. Every module MUST have a non-empty name, description, and at least 3 topics
4. Modules should progress logically from foundational to advanced concepts
5. Topics within a module should be related and build upon each other

TOPIC REQUIREMENTS:
- short_description: 1-2 sentences explaining what the user will learn (user-facing)
- detailed_description: Comprehensive description of the topic scope, key concepts, learning outcomes, and teaching approach (AI-facing, 3-5 sentences)
- estimated_duration_minutes: Realistic time estimate (typically 15-120 minutes per topic)

CONTENT QUALITY RULES:
- No overlapping content between topics or modules
- Each topic must be substantial and well-defined
- Ensure logical flow and prerequisites are respected
- Total course should be comprehensive but not overwhelming
- Calculate total_estimated_hours accurately based on all topic durations

AVOID:
- Vague or generic topic descriptions
- Redundant content
- Unrealistic time estimates
- Missing key concepts for the course level
- Empty or incomplete modules (will be rejected during validation)"""

        user_message = """Create a detailed syllabus for the following course:

Course Name: {course_name}
Course Description: {course_description}
Difficulty Level: {difficulty}

Generate a complete, well-structured syllabus following all requirements."""

        return ChatPromptTemplate.from_messages([
            ("system", system_message),
            ("user", user_message)
        ])
    
    def _create_reviewer_prompt(self) -> ChatPromptTemplate:
        """Create the prompt for syllabus review (Layer 2)."""
        system_message = """You are a senior educational quality assurance expert reviewing course syllabi.

Your task is to critically evaluate syllabi for quality, coherence, and educational value.

EVALUATION CRITERIA:

1. STRUCTURE (Critical):
   - Are modules logically organized and progressive?
   - Do topics within modules relate to each other?
   - Is there a clear learning path from beginner to advanced?

2. CONTENT QUALITY (Critical):
   - Are topic descriptions clear and specific?
   - Do detailed descriptions provide enough context for content creation?
   - Is content appropriate for the stated difficulty level?

3. NO REDUNDANCY (Critical):
   - Are there overlapping topics?
   - Is any content repeated across modules?
   - Are topics distinct and well-defined?

4. COMPLETENESS (Critical):
   - Does the syllabus cover all essential aspects of the course?
   - Are there any obvious gaps in coverage?
   - Are prerequisites and dependencies clear?

5. REALISM (Important):
   - Are time estimates realistic?
   - Is the total course length appropriate?
   - Can topics be reasonably covered in the estimated time?

DECISION RULES:
- APPROVE if: All critical criteria are met and the syllabus is high quality
- REJECT if: Any critical criteria fail or there are major quality issues

When REJECTING:
- Provide specific, actionable feedback
- List all issues found
- Explain what needs to be fixed

When APPROVING:
- Provide positive feedback
- May suggest minor improvements (but still approve)"""

        user_message = """Review the following syllabus:

{syllabus_json}

Evaluate based on all criteria and decide whether to approve or reject."""

        return ChatPromptTemplate.from_messages([
            ("system", system_message),
            ("user", user_message)
        ])
    
    def generate_syllabus(self, course: Course, user_preference: dict, assessment_result: AssessmentResult = None, max_attempts: int = 3) -> tuple[Syllabus, list]:
        """
        Generate a syllabus with two-layer validation and error recovery.
        
        Args:
            course: Course object to create syllabus for
            user_preference: User's learning style preference
            assessment_result: Optional pre-knowledge assessment result for customization
            max_attempts: Maximum number of generation attempts
            
        Returns:
            Tuple of (approved_syllabus, attempt_history)
        """
        generator_prompt = self._create_generator_prompt(user_preference, assessment_result)
        # Create chain with custom parser
        generator_chain = generator_prompt | self.generator_llm | self.syllabus_parser
        
        reviewer_prompt = self._create_reviewer_prompt()
        reviewer_chain = reviewer_prompt | self.structured_reviewer
        
        attempt_history = []
        
        print("=" * 80)
        print(f"GENERATING SYLLABUS FOR: {course.course_name}")
        print(f"Learning Style: {user_preference['name']}")
        if assessment_result:
            print(f"Pre-Knowledge Level: {assessment_result.knowledge_level} ({assessment_result.knowledge_percentage:.1f}%)")
        print("=" * 80)
        
        for attempt in range(1, max_attempts + 1):
            print(f"\nüìù Attempt {attempt}/{max_attempts}: Generating syllabus...")
            
            # Layer 1: Generate syllabus with error handling
            try:
                syllabus = generator_chain.invoke({
                    "course_name": course.course_name,
                    "course_description": course.description,
                    "difficulty": course.difficulty
                })
            except OutputParserException as e:
                print(f"‚ö†Ô∏è  Parser error on attempt {attempt}: {str(e)[:100]}...")
                if attempt < max_attempts:
                    print(f"üîÑ Retrying generation...")
                    continue
                else:
                    raise Exception(f"Failed to generate valid syllabus after {max_attempts} attempts") from e
            
            # Layer 2: Review syllabus with error handling
            print(f"üîç Attempt {attempt}/{max_attempts}: Reviewing syllabus...")
            try:
                review = reviewer_chain.invoke({
                    "syllabus_json": syllabus.model_dump_json(indent=2)
                })
            except Exception as e:
                print(f"‚ö†Ô∏è  Review error on attempt {attempt}: {str(e)[:100]}...")
                if attempt < max_attempts:
                    continue
                else:
                    # If review fails but syllabus is valid, use it anyway
                    print("‚ö†Ô∏è  Review failed but syllabus is structurally valid. Proceeding...")
                    return syllabus, attempt_history
            
            attempt_info = {
                "attempt": attempt,
                "syllabus": syllabus,
                "review": review
            }
            attempt_history.append(attempt_info)
            
            if review.approved:
                print(f"\n‚úÖ APPROVED on attempt {attempt}!")
                print(f"Feedback: {review.feedback}")
                return syllabus, attempt_history
            else:
                print(f"\n‚ùå REJECTED on attempt {attempt}")
                print(f"Feedback: {review.feedback}")
                if review.issues:
                    print(f"Issues found:")
                    for issue in review.issues:
                        print(f"  - {issue}")
                
                if attempt < max_attempts:
                    print(f"\nüîÑ Regenerating with feedback...")
        
        # If we get here, all attempts failed review but last one was parseable
        print(f"\n‚ö†Ô∏è  WARNING: Syllabus not approved after {max_attempts} attempts.")
        print(f"Returning the last generated syllabus (with issues).")
        return syllabus, attempt_history
    
    def print_syllabus(self, syllabus: Syllabus) -> None:
        """Pretty print a syllabus."""
        print("\n" + "=" * 80)
        print(f"COURSE SYLLABUS: {syllabus.course_name}")
        print("=" * 80)
        print(f"Objective: {syllabus.course_objective}")
        print(f"Learning Style: {syllabus.learning_style}")
        print(f"Total Modules: {syllabus.total_modules}")
        print(f"Total Estimated Hours: {syllabus.total_estimated_hours:.1f}")
        print("=" * 80)
        
        for mod_idx, module in enumerate(syllabus.modules, 1):
            print(f"\nüìö MODULE {mod_idx}: {module.module_name}")
            print(f"   {module.module_description}")
            print(f"   Topics: {len(module.topics)}")
            print()
            
            for topic_idx, topic in enumerate(module.topics, 1):
                print(f"   {mod_idx}.{topic_idx} {topic.topic_name}")
                print(f"       üìñ User Description: {topic.short_description}")
                print(f"       ü§ñ AI Description: {topic.detailed_description}")
                print(f"       ‚è±Ô∏è  Duration: {topic.estimated_duration_minutes} minutes")
                print()
        
        print("=" * 80)

## Load a Course (Option 1: From Saved File)

Load a course that was saved from the course planning notebook.

In [9]:
# Try to load a saved course
try:
    with open("output/selected_course.pkl", "rb") as f:
        loaded_course = pickle.load(f)
    
    print("‚úì Successfully loaded course from file:")
    print(f"  - Course: {loaded_course.course_name}")
    print(f"  - Difficulty: {loaded_course.difficulty}")
    print(f"  - Description: {loaded_course.description}")
except FileNotFoundError:
    print("‚ö† No saved course found. Please run the course planning notebook first,")
    print("  or manually create a course in the next cell.")
    loaded_course = None

‚úì Successfully loaded course from file:
  - Course: Introduction to Data Science
  - Difficulty: Beginner
  - Description: Learn the fundamentals of data science including data types, cleaning, and visualization techniques.


## Create a Course Manually (Option 2: Create Course Directly)

If you don't have a saved course, create one manually here.

In [10]:
# Option: Create a course manually for testing
manual_course = Course(
    course_name="Introduction to Data Science",
    description="Learn the fundamentals of data science including Python programming, statistics, data manipulation with pandas, and basic data visualization techniques.",
    difficulty="Beginner",
    prerequisites=[]
)

print("‚úì Manually created course:")
print(f"  - Course: {manual_course.course_name}")
print(f"  - Difficulty: {manual_course.difficulty}")
print(f"  - Description: {manual_course.description}")

# Use whichever course is available
selected_course = loaded_course if loaded_course else manual_course
print(f"\nüéØ Using course: {selected_course.course_name}")

‚úì Manually created course:
  - Course: Introduction to Data Science
  - Difficulty: Beginner
  - Description: Learn the fundamentals of data science including Python programming, statistics, data manipulation with pandas, and basic data visualization techniques.

üéØ Using course: Introduction to Data Science


## Select Learning Style Preference

In [11]:
# Display options and get user preference
# For notebook automation, we auto-select option "1" (Real-world Examples)
# In interactive mode, set auto_select=None

user_pref = UserPreferences.get_user_preference(auto_select="1")  # Change to None for interactive


‚úì Selected: Real-world Examples


## Pre-Knowledge Assessment

Now we'll assess your existing knowledge of the course topic to customize the syllabus.

In [None]:
# Initialize the assessment system
assessment_system = PreKnowledgeAssessment(model_name="llama3:8b", temperature=0.7)

# Generate assessment questions
question_set = assessment_system.generate_questions(
    course=selected_course,
    num_questions=8  # 3 beginner, 3 intermediate, 2 advanced
)

üéØ Generating 8 pre-assessment questions...
‚úì Generated 8 questions


## Take the Assessment

Answer the questions honestly. The "I don't know" option is encouraged - it helps us customize your learning path!

In [13]:
# Administer the assessment
# For automation, we can provide auto_answers (0-3 for each question)
# For interactive mode, set auto_answers=None

# Example auto_answers for testing (simulating a beginner with some basic knowledge):
# auto_answers = [3, 3, 0, 3, 1, 3, 3, 3]  # Mix of correct, incorrect, and "don't know"

assessment_result = assessment_system.administer_assessment(
    question_set=question_set,
    auto_answers=None  # Set to None for interactive mode
)


PRE-KNOWLEDGE ASSESSMENT: Introduction to Data Science
This assessment evaluates your existing knowledge of the course topic.
Answer honestly - selecting 'I don't know' helps us customize your learning path!


Question 1/8
Topic: Data science basics
Difficulty: Beginner
--------------------------------------------------------------------------------
What is data science?

  1. The study of data analysis and interpretation
  2. A type of machine learning algorithm
  3. A field that combines statistics and computer programming
  4. I don't know about this course
‚úì Recorded: I don't know
--------------------------------------------------------------------------------

Question 2/8
Topic: Data types
Difficulty: Beginner
--------------------------------------------------------------------------------
What are some common data types?

  1. Numerical, categorical, and text data
  2. Only numerical data exists
  3. Categorical data is the only type
  4. I don't know about this course
‚úì Re

## Generate Syllabus with Two-Layer Validation

The syllabus will now be generated based on your assessment results and learning preferences.

In [14]:
# Initialize the syllabus generator
syllabus_gen = SyllabusGenerator(model_name="llama3:8b", temperature=0.7)

# Generate syllabus with validation, incorporating assessment results
approved_syllabus, history = syllabus_gen.generate_syllabus(
    course=selected_course,
    user_preference=user_pref,
    assessment_result=assessment_result,  # Pass assessment results for customization
    max_attempts=3
)

# Display the final syllabus
syllabus_gen.print_syllabus(approved_syllabus)

GENERATING SYLLABUS FOR: Introduction to Data Science
Learning Style: Real-world Examples
Pre-Knowledge Level: None (0.0%)

üìù Attempt 1/3: Generating syllabus...
‚ö†Ô∏è  Parser error on attempt 1: Invalid json output: **Introduction to Data Science Syllabus**

**Module 1: Fundamentals of Data Sci...
üîÑ Retrying generation...

üìù Attempt 2/3: Generating syllabus...
‚ö†Ô∏è  Parser error on attempt 2: Invalid json output: **Introduction to Data Science Syllabus**

**Module 1: Fundamentals of Data Sci...
üîÑ Retrying generation...

üìù Attempt 3/3: Generating syllabus...
‚ö†Ô∏è  Parser error on attempt 3: Invalid json output: **Syllabus: Introduction to Data Science**

**Course Overview:**
This course is...


Exception: Failed to generate valid syllabus after 3 attempts

## Review Attempt History

In [None]:
# Display the history of all attempts
print("=" * 80)
print("SYLLABUS GENERATION ATTEMPT HISTORY")
print("=" * 80)

for attempt_info in history:
    attempt = attempt_info['attempt']
    review = attempt_info['review']
    syllabus = attempt_info['syllabus']
    
    status = "‚úÖ APPROVED" if review.approved else "‚ùå REJECTED"
    print(f"\nAttempt {attempt}: {status}")
    print(f"Modules Generated: {syllabus.total_modules}")
    print(f"Total Topics: {sum(len(m.topics) for m in syllabus.modules)}")
    print(f"Estimated Hours: {syllabus.total_estimated_hours:.1f}")
    print(f"Review Feedback: {review.feedback}")
    
    if review.issues:
        print(f"Issues Found:")
        for issue in review.issues:
            print(f"  - {issue}")
    print("-" * 80)

## Export Syllabus to JSON

In [None]:
# Export the approved syllabus to JSON
syllabus_json = approved_syllabus.model_dump_json(indent=2)

print("SYLLABUS JSON OUTPUT:")
print("=" * 80)
print(syllabus_json)
print("=" * 80)

# Save to file
output_dir = "output"
os.makedirs(output_dir, exist_ok=True)

filename = f"{output_dir}/{approved_syllabus.course_name.replace(' ', '_')}_syllabus.json"
with open(filename, 'w', encoding='utf-8') as f:
    f.write(syllabus_json)

print(f"\n‚úì Syllabus saved to: {filename}")

## Try Different Learning Styles

Generate syllabi with different learning preferences to see how they differ.

In [None]:
# Try with a different learning style - Project-Based
project_pref = UserPreferences.get_user_preference(auto_select="3")

project_syllabus, project_history = syllabus_gen.generate_syllabus(
    course=selected_course,
    user_preference=project_pref,
    assessment_result=assessment_result,  # Use same assessment results
    max_attempts=3
)

print("\n" + "=" * 80)
print("COMPARISON: Real-world Examples vs Project-Based")
print("=" * 80)
print(f"\nReal-world Examples Style:")
print(f"  - Modules: {approved_syllabus.total_modules}")
print(f"  - Topics: {sum(len(m.topics) for m in approved_syllabus.modules)}")
print(f"  - Hours: {approved_syllabus.total_estimated_hours:.1f}")

print(f"\nProject-Based Style:")
print(f"  - Modules: {project_syllabus.total_modules}")
print(f"  - Topics: {sum(len(m.topics) for m in project_syllabus.modules)}")
print(f"  - Hours: {project_syllabus.total_estimated_hours:.1f}")

print(f"\nNote: Both syllabi are customized for {assessment_result.knowledge_level} knowledge level")

## Summary: Enhanced Syllabus Generation System

This system combines pre-knowledge assessment with two-layer validation to provide:

1. **Personalized Learning**: 
   - MCQ-based pre-knowledge assessment (8 questions covering beginner to advanced)
   - "I don't know" option for honest self-assessment
   - Automatic knowledge level classification (None, Basic, Intermediate, Advanced)
   - Syllabus customized to student's existing knowledge

2. **Quality Assurance**: 
   - Second layer catches issues like:
     * Overlapping content
     * Incomplete coverage
     * Vague descriptions
     * Unrealistic time estimates

3. **Hallucination Prevention**: 
   - Detailed descriptions for AI ensure accurate content generation
   - Review layer validates factual accuracy and coherence
   - Local answer validation (no LLM needed for grading)

4. **Iterative Improvement**:
   - Automatic regeneration with feedback
   - Up to N attempts to get it right
   - Clear tracking of issues and improvements

5. **Flexibility**:
   - Multiple learning style options (6 choices)
   - Customizable prompts per preference
   - Variable module/topic structure (3-8 modules, 3-10 topics each)
   - Adaptive difficulty based on assessment results

6. **Production Ready**:
   - Complete error handling
   - JSON export for API integration
   - Detailed logging and history tracking
   - Dual descriptions (user-friendly + AI-detailed)
   - Assessment results stored for analytics

## Workflow Summary:
1. Load/create course ‚Üí 2. Take pre-knowledge assessment ‚Üí 3. Select learning style ‚Üí 4. Generate customized syllabus ‚Üí 5. Review and export