# Group-wise Isolation Analysis in Timetabling System

## Problem Statement
You've identified a critical issue: **"Different groups should be scheduled separately, not at the same time, even for the same course."**

Despite both group sizes and room capacities being max 48, the system reports capacity conflicts. This suggests the group-wise isolation is not properly implemented.

## Investigation Goals
1. Verify actual group sizes and room capacities
2. Identify where group isolation logic fails
3. Find root cause of capacity conflicts
4. Propose fixes for proper group-wise scheduling

---

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os
from pathlib import Path

# Add the src directory to Python path
sys.path.append(str(Path('../src').resolve()))

# Import timetabling modules
from entities import Course, Instructor, Room, Group
from data.ingestion import DataIngestion
from ga.chromosome import Chromosome, Gene
from constraints.hard_constraints import HardConstraintChecker

# Setup display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
plt.style.use('default')

print("✅ Libraries loaded successfully")
print(f"📁 Working directory: {os.getcwd()}")
print(f"📁 Parent directory: {os.path.dirname(os.getcwd())}")

# Change to parent directory for data access
os.chdir('..')

In [1]:
# Load and analyze room and group data
import pandas as pd
import numpy as np

# Load data files
rooms_df = pd.read_csv('../data/sample_rooms.csv')
groups_df = pd.read_csv('../data/sample_groups.csv')
courses_df = pd.read_csv('../data/sample_courses.csv')

print("=== ROOM DATA SUMMARY ===")
print(f"Number of rooms: {len(rooms_df)}")
print(f"Room capacities: {rooms_df['capacity'].describe()}")
print(f"Room capacity range: {rooms_df['capacity'].min()} - {rooms_df['capacity'].max()}")

print("\n=== GROUP DATA SUMMARY ===")
print(f"Number of groups: {len(groups_df)}")
print(f"Group sizes: {groups_df['student_count'].describe()}")
print(f"Group size range: {groups_df['student_count'].min()} - {groups_df['student_count'].max()}")

print("\n=== COURSE DATA SUMMARY ===")
print(f"Number of courses: {len(courses_df)}")
print(f"Groups per course: {courses_df['group_count'].describe()}")

# Check for violations
print("\n=== CHECKING FOR VIOLATIONS ===")
groups_over_48 = groups_df[groups_df['student_count'] > 48]
rooms_under_48 = rooms_df[rooms_df['capacity'] < 48]

print(f"Groups with more than 48 students: {len(groups_over_48)}")
if len(groups_over_48) > 0:
    print(groups_over_48[['group_id', 'student_count']])

print(f"Rooms with less than 48 capacity: {len(rooms_under_48)}")
if len(rooms_under_48) > 0:
    print(rooms_under_48[['room_id', 'capacity']])

# Display sample data
print("\n=== SAMPLE DATA ===")
print("Sample rooms:")
print(rooms_df.head())
print("\nSample groups:")
print(groups_df.head())

=== ROOM DATA SUMMARY ===
Number of rooms: 68
Room capacities: count     68.000000
mean      56.529412
std       33.956280
min       48.000000
25%       48.000000
50%       48.000000
75%       48.000000
max      250.000000
Name: capacity, dtype: float64
Room capacity range: 48 - 250

=== GROUP DATA SUMMARY ===
Number of groups: 14
Group sizes: count    14.0
mean     48.0
std       0.0
min      48.0
25%      48.0
50%      48.0
75%      48.0
max      48.0
Name: student_count, dtype: float64
Group size range: 48 - 48

=== COURSE DATA SUMMARY ===
Number of courses: 113


KeyError: 'group_count'

In [2]:
# Check available columns
print("Courses columns:", courses_df.columns.tolist())
print("Rooms columns:", rooms_df.columns.tolist())  
print("Groups columns:", groups_df.columns.tolist())

# Corrected analysis
print("\n=== COURSE DATA SUMMARY ===")
print(f"Number of courses: {len(courses_df)}")
print(courses_df.head())

# Check for violations
print("\n=== CHECKING FOR VIOLATIONS ===")
groups_over_48 = groups_df[groups_df['student_count'] > 48]
rooms_under_48 = rooms_df[rooms_df['capacity'] < 48]

print(f"Groups with more than 48 students: {len(groups_over_48)}")
if len(groups_over_48) > 0:
    print(groups_over_48[['group_id', 'student_count']])

print(f"Rooms with less than 48 capacity: {len(rooms_under_48)}")
if len(rooms_under_48) > 0:
    print(rooms_under_48[['room_id', 'capacity']])

# Display sample data
print("\n=== SAMPLE DATA ===")
print("Sample rooms:")
print(rooms_df.head())
print("\nSample groups:")
print(groups_df.head())

Courses columns: ['course_id', 'name', 'sessions_per_week', 'duration', 'required_room_type', 'group_ids', 'qualified_instructor_ids']
Rooms columns: ['room_id', 'name', 'capacity', 'type', 'available_slots']
Groups columns: ['group_id', 'name', 'student_count', 'enrolled_courses', 'preferred_break_duration']

=== COURSE DATA SUMMARY ===
Number of courses: 113
  course_id                       name  sessions_per_week  duration  \
0  ENAR 151  Architectural Graphics II                2.0        50   
1  ENAR 152           Design Studio II                2.0        50   
2  ENAR 153    Building Construction I                2.0        50   
3  ENAR 155     Free Hand Sketching II                2.0        50   
4  ENCE 152     Engineering Geology II                2.0        50   

  required_room_type group_ids qualified_instructor_ids  
0            lecture        13                       64  
1            lecture        13                       61  
2            lecture        13      

## Problem Analysis: Room Capacity Checking Issue

Based on the error logs and code analysis, I've identified the core issue:

### Current Logic (INCORRECT)
The system only checks if individual groups exceed room capacity:
```python
if group.student_count > room.capacity:
    violations += 1
```

### The Problem
**Multiple groups can be assigned to the same room at the same time**, and the system doesn't check if their combined student count exceeds the room capacity.

For example:
- Room A has capacity 48
- Group 1 has 48 students → Individual check: OK ✓
- Group 2 has 48 students → Individual check: OK ✓  
- BUT: Both groups scheduled in Room A at same time → Total: 96 students in 48-capacity room ❌

### The Fix
We need to check the **total student count** for all groups assigned to the same room at the same time slot.

In [3]:
# Corrected room capacity checking logic
from collections import defaultdict

def check_room_capacity_violations_corrected(chromosome, courses, rooms, groups):
    """
    CORRECTED: Check for room capacity violations by considering ALL groups 
    assigned to the same room at the same time slot.
    """
    violations = 0
    
    # Group genes by room and time slot
    room_time_assignments = defaultdict(list)
    
    for gene in chromosome.genes:
        if (gene.course_id in courses and 
            gene.room_id in rooms and
            gene.group_id in groups):
            
            time_key = f"{gene.day}_{gene.time_slot}"
            room_time_key = (gene.room_id, time_key)
            room_time_assignments[room_time_key].append(gene)
    
    # Check each room-time combination
    for (room_id, time_key), genes in room_time_assignments.items():
        if room_id in rooms:
            room = rooms[room_id]
            
            # Calculate total student count for this room at this time
            total_students = 0
            for gene in genes:
                if gene.group_id in groups:
                    group = groups[gene.group_id]
                    total_students += group.student_count
            
            # Check if total exceeds room capacity
            if total_students > room.capacity:
                violations += 1
                print(f"VIOLATION: Room {room_id} (capacity: {room.capacity}) "
                      f"has {total_students} students at {time_key}")
                print(f"  Groups: {[gene.group_id for gene in genes]}")
                print(f"  Student counts: {[groups[gene.group_id].student_count for gene in genes if gene.group_id in groups]}")
    
    return violations

# The original (incorrect) logic for comparison
def check_room_capacity_violations_original(chromosome, courses, rooms, groups):
    """
    ORIGINAL (INCORRECT): Only checks individual group vs room capacity
    """
    violations = 0
    
    for gene in chromosome.genes:
        if (gene.course_id in courses and 
            gene.room_id in rooms and
            gene.group_id in groups):
            
            course = courses[gene.course_id]
            room = rooms[gene.room_id]
            group = groups[gene.group_id]
            
            # Check if room capacity is insufficient for the specific group
            if group.student_count > room.capacity:
                violations += 1
                print(f"Individual violation: Room {gene.room_id} (capacity: {room.capacity}) "
                      f"cannot accommodate group {gene.group_id} (students: {group.student_count})")
    
    return violations

print("✅ Corrected room capacity checking logic implemented!")
print("Key difference: Now checks TOTAL students in room at same time, not just individual groups")

✅ Corrected room capacity checking logic implemented!
Key difference: Now checks TOTAL students in room at same time, not just individual groups


In [4]:
# Test the corrected logic with a simulation
import sys
sys.path.append('../src')

# Create mock data for testing
class MockGene:
    def __init__(self, course_id, room_id, group_id, day, time_slot):
        self.course_id = course_id
        self.room_id = room_id
        self.group_id = group_id
        self.day = day
        self.time_slot = time_slot

class MockChromosome:
    def __init__(self, genes):
        self.genes = genes

class MockRoom:
    def __init__(self, capacity):
        self.capacity = capacity

class MockGroup:
    def __init__(self, student_count):
        self.student_count = student_count

class MockCourse:
    def __init__(self, name):
        self.name = name

# Create test scenario: 2 groups with 48 students each, room with 48 capacity
test_rooms = {
    "R001": MockRoom(48),
    "R002": MockRoom(96)
}

test_groups = {
    "G001": MockGroup(48),
    "G002": MockGroup(48)
}

test_courses = {
    "C001": MockCourse("Math"),
    "C002": MockCourse("Physics")
}

# Test scenario 1: Both groups in same room at same time (SHOULD VIOLATE)
print("=== TEST SCENARIO 1: Both groups in same small room at same time ===")
genes1 = [
    MockGene("C001", "R001", "G001", "Monday", "09:00"),
    MockGene("C002", "R001", "G002", "Monday", "09:00")  # Same room, same time!
]
chromosome1 = MockChromosome(genes1)

violations1 = check_room_capacity_violations_corrected(chromosome1, test_courses, test_rooms, test_groups)
print(f"Violations found: {violations1}")

print("\n=== TEST SCENARIO 2: Both groups in different rooms ===")
genes2 = [
    MockGene("C001", "R001", "G001", "Monday", "09:00"),
    MockGene("C002", "R002", "G002", "Monday", "09:00")  # Different rooms
]
chromosome2 = MockChromosome(genes2)

violations2 = check_room_capacity_violations_corrected(chromosome2, test_courses, test_rooms, test_groups)
print(f"Violations found: {violations2}")

print("\n=== TEST SCENARIO 3: Both groups in large room ===")
genes3 = [
    MockGene("C001", "R002", "G001", "Monday", "09:00"),
    MockGene("C002", "R002", "G002", "Monday", "09:00")  # Same large room
]
chromosome3 = MockChromosome(genes3)

violations3 = check_room_capacity_violations_corrected(chromosome3, test_courses, test_rooms, test_groups)
print(f"Violations found: {violations3}")

print("\n=== COMPARISON WITH ORIGINAL LOGIC ===")
print("Original logic on scenario 1:")
violations_orig = check_room_capacity_violations_original(chromosome1, test_courses, test_rooms, test_groups)
print(f"Original violations: {violations_orig} (WRONG - should be 1)")
print(f"Corrected violations: {violations1} (CORRECT)")

print("\n✅ The corrected logic properly detects when multiple groups exceed room capacity!")

=== TEST SCENARIO 1: Both groups in same small room at same time ===
VIOLATION: Room R001 (capacity: 48) has 96 students at Monday_09:00
  Groups: ['G001', 'G002']
  Student counts: [48, 48]
Violations found: 1

=== TEST SCENARIO 2: Both groups in different rooms ===
Violations found: 0

=== TEST SCENARIO 3: Both groups in large room ===
Violations found: 0

=== COMPARISON WITH ORIGINAL LOGIC ===
Original logic on scenario 1:
Original violations: 0 (WRONG - should be 1)
Corrected violations: 1 (CORRECT)

✅ The corrected logic properly detects when multiple groups exceed room capacity!


## 🚨 CRITICAL ISSUE IDENTIFIED: Multiple Groups in Same Session

### The Real Problem
The output shows:
```
ENME 155,Workshop Technology,51,Kamal Pokharel,A303,Classroom A-303,Monday,08:00-09:30,50,"5, 9, 11"
```

**This means 3 different groups (5, 9, 11) are scheduled for the same course at the same time!**

### What Should Happen Instead
**Each group should get its own separate session:**
```
ENME 155,Workshop Technology,51,Kamal Pokharel,A303,Classroom A-303,Monday,08:00-09:30,50,"5"
ENME 155,Workshop Technology,51,Kamal Pokharel,A303,Classroom A-303,Monday,09:45-11:15,50,"9" 
ENME 155,Workshop Technology,51,Kamal Pokharel,A303,Classroom A-303,Monday,11:30-13:00,50,"11"
```

### Root Cause
The genetic algorithm is creating **one gene per course** instead of **one gene per course per group**.

Let's investigate and fix this fundamental flaw...