# üöÄ Career Path Intelligence Engine
## Powered by SAP SuccessFactors + SAP Databricks

**Transform HR Decision Making with AI-Powered Career Intelligence**

This demo showcases how SAP Databricks unlocks the hidden potential in your SAP SuccessFactors data through:
- üéØ **Predictive Career Pathing**: AI-driven recommendations based on historical success patterns
- üîç **Hidden Talent Discovery**: Identify high-potential employees ready for advancement
- üìà **Success Probability Modeling**: Predict career move outcomes with confidence scores


### **Why SAP Databricks + SuccessFactors = Career Intelligence Revolution**
- **Advanced Analytics**: ML models that learn from organizational career patterns
- **Unified Data Platform**: Seamlessly combine HR, performance, and organizational data
- **Real-time Processing**: Instant insights as your workforce evolves
- **Scalable Intelligence**: Analyze patterns across your entire workforce history

## üìä Setup & Configuration

In [0]:
# Install all required libraries for serverless compute
# Using --quiet flag to suppress verbose output for business-ready presentation
%pip install --quiet plotly>=5.18.0 mlflow>=2.8.0 lightgbm
print("‚úÖ Required libraries installed successfully")

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
# Restart Python to ensure all libraries are properly loaded
%restart_python

In [0]:
# Import all required libraries after restart
import pyspark.sql.functions as F
from pyspark.sql.types import *
from pyspark.sql import DataFrame
from pyspark.sql.window import Window

# Advanced visualization libraries
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np

# MLflow for model loading
import mlflow
from mlflow.pyfunc import load_model
from mlflow.tracking import MlflowClient

from datetime import datetime, timedelta, date
import re
import warnings
import logging
import os
import sys
from contextlib import contextmanager

# Suppress all warnings for business-ready presentation
warnings.filterwarnings('ignore')

# Suppress MLflow logging warnings (these are informational, not errors)
logging.getLogger('mlflow.models.utils').setLevel(logging.ERROR)
logging.getLogger('mlflow').setLevel(logging.ERROR)
logging.getLogger('mlflow.tracking').setLevel(logging.ERROR)

# Suppress other common warning sources
logging.getLogger('urllib3').setLevel(logging.ERROR)
logging.getLogger('requests').setLevel(logging.ERROR)

# Set environment variable to suppress MLflow warnings
os.environ['MLFLOW_DISABLE_WARNINGS'] = '1'

# Context manager to suppress stderr during MLflow operations
@contextmanager
def suppress_stderr():
    """Temporarily suppress stderr output for cleaner business presentations"""
    with open(os.devnull, 'w') as devnull:
        old_stderr = sys.stderr
        try:
            sys.stderr = devnull
            yield
        finally:
            sys.stderr = old_stderr

# Configure MLflow to use Unity Catalog
mlflow.set_registry_uri("databricks-uc")

# Initialize MLflow client
mlflow_client = MlflowClient()

# Import helper functions
# In Databricks, helper files in same directory are auto-importable
try:
    from career_intelligence_helpers import (
        load_career_models, prepare_ml_features_for_prediction, prepare_features_for_model,
        explain_prediction, format_feature_name, extract_sklearn_model_from_mlflow,
        get_potential_next_roles, create_transition_features,
        estimate_salary_increase, get_success_factors, get_risk_factors,
        ensure_dataframe_schema, get_role_compatibility_score, calculate_skill_gap_penalty,
        get_role_specific_timeline
    )
except ImportError:
    # Fallback: add current directory to path
    import sys
    import os
    current_dir = os.path.dirname(os.path.abspath(__file__)) if '__file__' in globals() else os.getcwd()
    if current_dir not in sys.path:
        sys.path.insert(0, current_dir)
    from career_intelligence_helpers import (
        load_career_models, prepare_ml_features_for_prediction, prepare_features_for_model,
        explain_prediction, format_feature_name, extract_sklearn_model_from_mlflow,
        get_potential_next_roles, create_transition_features,
        estimate_salary_increase, get_success_factors, get_risk_factors,
        ensure_dataframe_schema, get_role_compatibility_score, calculate_skill_gap_penalty,
        get_role_specific_timeline
    )

# Configure for optimal display
displayHTML("""
<style>
    div.output_subarea { max-width: 100%; }
    
    /* Unified HTML Component Styles */
    .card-primary { background: linear-gradient(135deg, #1e3c72 0%, #2a5298 50%, #1e3c72 100%); 
                    padding: 0; border-radius: 12px; margin: 20px 0; 
                    box-shadow: 0 4px 20px rgba(0,0,0,0.15); overflow: hidden; }
    .card-header { background: rgba(255,255,255,0.12); padding: 18px 24px; 
                   border-bottom: 1px solid rgba(255,255,255,0.15); }
    .card-content { padding: 24px; background: rgba(255,255,255,0.03); }
    .card-footer { background: rgba(76,175,80,0.15); padding: 12px 24px; 
                   border-top: 1px solid rgba(255,255,255,0.1); }
    .text-accent { color: #FFD93D; }
    .text-muted { color: rgba(255,255,255,0.7); }
</style>
""")

print("‚úÖ Career Intelligence Engine initialized")

‚úÖ Career Intelligence Engine initialized


In [0]:
# Import configuration and load data
%run ./setup_config.py

üìã Configuration:
   Catalog: demos
   Schema: career_path
üèõÔ∏è Setting up Unity Catalog structure...
‚úÖ Using existing catalog 'demos'
‚úÖ Schema 'career_path' ready
‚úÖ Unity Catalog setup complete: demos.career_path


In [0]:
# Load data from Unity Catalog
print(f"üìã Loading from Unity Catalog: {catalog_name}.{schema_name}")

# Use the catalog and schema
spark.sql(f"USE CATALOG {catalog_name}")
spark.sql(f"USE SCHEMA {schema_name}")

üìã Loading from Unity Catalog: demos.career_path


DataFrame[]

In [0]:
# Load data from Unity Catalog tables
print("üìä Loading SAP SuccessFactors data from Unity Catalog...")

try:
    employees_df = spark.table(f"{catalog_name}.{schema_name}.employees")
    performance_df = spark.table(f"{catalog_name}.{schema_name}.performance")
    print(f"‚úÖ Data loaded: {employees_df.count():,} employees")
    
    # Create enriched employees view with performance metrics
    latest_performance = performance_df.withColumn(
        "row_num",
        F.row_number().over(
            Window.partitionBy("employee_id")
            .orderBy(F.col("review_date").desc())
        )
    ).filter(F.col("row_num") == 1).drop("row_num")
    
    # Join employees with latest performance data
    enriched_employees_df = employees_df.alias("e").join(
        latest_performance.alias("p"),
        F.col("e.employee_id") == F.col("p.employee_id"),
        "left"
    ).select(
        F.col("e.*"),
        F.coalesce(F.col("p.overall_rating"), F.lit(3.5)).alias("performance_rating"),
        F.coalesce(F.col("p.competency_rating"), F.lit(3.5)).alias("competency_rating"),
        F.coalesce(F.col("p.overall_rating"), F.lit(3.5)).alias("overall_rating")
    ).withColumn(
        "current_level", F.col("job_title")
    ).withColumn(
        "months_in_role", F.col("months_in_current_role")
    ).withColumn(
        "months_in_company", F.col("tenure_months")
    )
    
    # Calculate derived metrics
    enriched_employees_df = enriched_employees_df.withColumn(
        "engagement_score", F.lit(75) + (F.col("performance_rating") - 3) * 10
    ).withColumn(
        "potential_score", F.lit(70) + (F.col("performance_rating") - 3) * 10
    ).withColumn(
        "leadership_readiness", F.lit(60) + (F.col("performance_rating") - 3) * 10
    ).withColumn(
        "flight_risk", F.lit(30) - (F.col("performance_rating") - 3) * 10
    )
    
    enriched_employees_df = enriched_employees_df.withColumn(
        "performance_trend", 
        F.when(F.col("performance_rating") >= 4, "Rising")
        .when(F.col("performance_rating") <= 2.5, "Declining")
        .otherwise("Stable")
    )
    
    employees_df = enriched_employees_df
    print(f"‚úÖ Enriched employees data with performance metrics")
except Exception as e:
    print(f"‚ö†Ô∏è Data not found. Please run 01_data_generation notebook first: {e}")
    raise

üìä Loading SAP SuccessFactors data from Unity Catalog...
‚úÖ Data loaded: 1,058 employees
‚úÖ Enriched employees data with performance metrics


In [0]:
def demonstrate_sap_bdc_integration():
    """Demonstrate SAP BDC Delta Sharing integration"""
    sap_bdc_data_products = {
        'CoreWorkforceData': {
            'package': 'SAP SuccessFactors Employee Central Data Products',
            'access_method': 'Delta Sharing',
            'governance': 'Unity Catalog'
        },
        'PerformanceData': {
            'package': 'SAP SuccessFactors Performance and Goals Data Products',
            'access_method': 'Delta Sharing',
            'governance': 'Unity Catalog'
        },
        'PerformanceReviews': {
            'package': 'SAP SuccessFactors Performance and Goals Data Products',
            'access_method': 'Delta Sharing',
            'governance': 'Unity Catalog'
        },
        'LearningHistory': {
            'package': 'SAP SuccessFactors Learning Data Products',
            'access_method': 'Delta Sharing',
            'governance': 'Unity Catalog'
        },
        'GoalsData': {
            'package': 'SAP SuccessFactors Performance and Goals Data Products',
            'access_method': 'Delta Sharing',
            'governance': 'Unity Catalog'
        },
        'Compensation': {
            'package': 'SAP SuccessFactors Employee Central Data Products',
            'access_method': 'Delta Sharing',
            'governance': 'Unity Catalog'
        }
    }
    
    # Display SAP BDC integration status
    displayHTML(f"""
    <div style="background: linear-gradient(135deg, #0052CC 0%, #0070F2 100%); 
                padding: 30px; border-radius: 20px; color: white; margin: 20px 0;">
        <h2 style="text-align: center; margin-bottom: 25px;">üîó SAP BDC - Success Factors Data Products</h2>
        
        
        <div style="margin-top: 25px; padding: 20px; background: rgba(255,255,255,0.1); border-radius: 10px;">
            <h3 style="color: #4ECDC4; margin: 0 0 15px 0;">üì¶ SAP SuccessFactors Data Products</h3>
            <ul style="list-style: none; padding: 0; margin: 0;">
                <li style="margin: 8px 0;">‚Ä¢ CoreWorkforceData (Employee Central)</li>
                <li style="margin: 8px 0;">‚Ä¢ PerformanceData (Performance & Goals)</li>
                <li style="margin: 8px 0;">‚Ä¢ PerformanceReviews (Performance & Goals)</li>
                <li style="margin: 8px 0;">‚Ä¢ LearningHistory (Learning)</li>
                <li style="margin: 8px 0;">‚Ä¢ GoalsData (Performance & Goals)</li>
                <li style="margin: 8px 0;">‚Ä¢ Compensation (Employee Central)</li>
            </ul>
            <p style="margin-top: 15px; font-size: 13px; opacity: 0.9;">
                <strong>üí° Key Benefit:</strong> All data accessed directly via Delta Sharing - no ETL, no copying, 
                no storage duplication. Real-time access to SAP SuccessFactors data products.
            </p>
        </div>
    </div>
    """)
    
    return sap_bdc_data_products

# Demonstrate SAP BDC integration
sap_bdc_products = demonstrate_sap_bdc_integration()


## üß† ML Model Integration
### *Load Trained ML Models from Unity Catalog*

In [0]:
# Load models using helper function
try:
    _ = catalog_name
    _ = schema_name
except NameError:
    raise NameError(
        "‚ùå catalog_name and schema_name are not defined. "
        "Please run '%run ./setup_config.py' before loading models."
    )

In [0]:
# Load ML models using helper function
career_models, model_metrics = load_career_models(catalog_name, schema_name, mlflow_client, displayHTML)

if not career_models:
    raise ValueError("‚ùå ML models not loaded. Please run notebook 02_career_intelligence_ml_models.py to train and register models in Unity Catalog.")

    print(f"‚úÖ {len(career_models)} ML models active and ready for predictions")

‚úÖ Career Path Risk Model loaded (v8, AUC: 96.54%)
‚úÖ Retention Risk Model loaded (v4, AUC: 99.32%)
‚úÖ High Potential Model loaded (v4, AUC: 99.58%)
‚úÖ Promotion Readiness Model loaded (v3, R¬≤: 97.05%)


## üéØ Career Intelligence in Action

Explore AI-powered career insights using real SAP SuccessFactors data. Ask questions about employees, identify high-potential talent, and discover optimal career paths.


### Meet Alex Smith

In [0]:
# Find our demo employee Alex Smith
alex_data = employees_df.filter(
    (F.col("first_name") == "Alex") & (F.col("last_name") == "Smith")
).collect()

if alex_data:
    alex = alex_data[0]
    
    print(f"üéØ MEET {alex.first_name.upper()} {alex.last_name.upper()} - {alex.job_title}")
    print("=" * 50)
    print(f"üë§ Profile: {alex.age} years old, {alex.gender}")
    print(f"üè¢ Role: {alex.current_level} in {alex.department}")
    print(f"‚è±Ô∏è Tenure: {alex.months_in_role} months in current role, {alex.months_in_company} months at company")
    print(f"‚≠ê Performance: {alex.performance_rating}/5 ({alex.performance_trend} trend)")
    print(f"üí™ Engagement: {alex.engagement_score}% | Potential: {alex.potential_score}%")
    print(f"üëë Leadership Readiness: {alex.leadership_readiness}%")
    print(f"‚ö†Ô∏è Flight Risk: {alex.flight_risk}%")
    
    # Store Alex's data for further analysis
    alex_employee_id = alex.employee_id
    
    displayHTML(f"""
    <div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); 
                padding: 20px; border-radius: 15px; color: white; margin: 20px 0;">
        <h2>üéØ SPOTLIGHT: {alex.first_name} {alex.last_name}</h2>
        <div style="display: flex; justify-content: space-between; flex-wrap: wrap;">
            <div style="min-width: 200px;">
                <h4>üìä Current Status</h4>
                <p><strong>Role:</strong> {alex.current_level}</p>
                <p><strong>Department:</strong> {alex.department}</p>
                <p><strong>Tenure:</strong> {alex.months_in_role} months</p>
            </div>
            <div style="min-width: 200px;">
                <h4>‚≠ê Performance Metrics</h4>
                <p><strong>Rating:</strong> {alex.performance_rating}/5</p>
                <p><strong>Engagement:</strong> {alex.engagement_score}%</p>
                <p><strong>Potential:</strong> {alex.potential_score}%</p>
            </div>
            <div style="min-width: 200px;">
                <h4>üöÄ Intelligence Insights</h4>
                <p><strong>Leadership Ready:</strong> {alex.leadership_readiness}%</p>
                <p><strong>Flight Risk:</strong> {alex.flight_risk}%</p>
                <p><strong>Trend:</strong> {alex.performance_trend}</p>
            </div>
        </div>
    </div>
    """)
else:
    print("‚ö†Ô∏è Alex Smith not found in employee data")
    alex_employee_id = None

üéØ MEET ALEX SMITH - Software Engineer
üë§ Profile: 32 years old, Female
üè¢ Role: Software Engineer in Engineering
‚è±Ô∏è Tenure: 17 months in current role, 18 months at company
‚≠ê Performance: 4.9/5 (Rising trend)
üí™ Engagement: 94.0% | Potential: 89.0%
üëë Leadership Readiness: 79.0%
‚ö†Ô∏è Flight Risk: 10.999999999999996%


## üí¨ Ask Questions About Alex

Use AI-powered queries to analyze Alex's career profile and get intelligent insights.

**üî¥ LIVE DEMO:** Modify the `example_question` variable below and re-run this cell to show instant AI responses!

In [0]:
# Define AI query functions before use
def build_context_summary(context, question=""):
    """Build context string from employee data by querying actual database"""
    
    if not context:
        return "No specific context provided."
    
    # Try to extract employee ID from context
    import re
    emp_id_match = re.search(r'EMP\d+', context)
    
    if emp_id_match:
        employee_id = emp_id_match.group()
        try:
            emp_data = employees_df.filter(F.col('employee_id') == employee_id).collect()
            if emp_data:
                emp = emp_data[0]
                return f"""
                Employee: {emp.get('name', 'Unknown')}
                Role: {emp.get('current_level', 'Unknown')}
                Department: {emp.get('department', 'Unknown')}
                Performance Rating: {emp.get('performance_rating', 'N/A')}/5
                Tenure: {emp.get('months_in_company', 0)} months in company, {emp.get('months_in_role', 0)} months in current role
                Engagement: {emp.get('engagement_score', 0)}%
                Potential: {emp.get('potential_score', 0)}%
                Leadership Readiness: {emp.get('leadership_readiness', 0)}%
                """
        except Exception as e:
            pass
    
    # If context contains organizational info or question asks about teams/departments, query actual data
    try:
        question_lower = question.lower() if question else ""
        context_lower = context.lower() if context else ""
        combined_text = f"{question_lower} {context_lower}"
        
        # Detect if this is about engineering team
        if 'engineering' in combined_text or 'engineer' in combined_text:
            eng_employees = employees_df.filter(
                (F.lower(F.col('department')).contains('engineering')) |
                (F.lower(F.col('job_title')).contains('engineer')) |
                (F.lower(F.col('job_title')).contains('developer')) |
                (F.lower(F.col('job_title')).contains('software'))
            ).select(
                'employee_id', 'first_name', 'last_name', 'job_title', 
                'job_level', 'department', 'performance_rating',
                'months_in_current_role', 'potential_score', 'leadership_readiness'
            ).collect()
            
            if eng_employees:
                eng_summary = f"Engineering Team Analysis ({len(eng_employees)} members):\n\n"
                for emp in eng_employees[:20]:  # Limit to top 20 for context
                    eng_summary += f"‚Ä¢ {emp.get('first_name', '')} {emp.get('last_name', '')} - {emp.get('job_title', 'Unknown')} ({emp.get('job_level', 'N/A')})\n"
                    eng_summary += f"  Performance: {emp.get('performance_rating', 0):.1f}/5, "
                    eng_summary += f"Months in Role: {emp.get('months_in_current_role', 0)}, "
                    eng_summary += f"Potential: {emp.get('potential_score', 0):.0f}%, "
                    eng_summary += f"Leadership: {emp.get('leadership_readiness', 0):.0f}%\n"
                
                if len(eng_employees) > 20:
                    eng_summary += f"\n... and {len(eng_employees) - 20} more team members\n"
                
                return eng_summary
        
        # Detect if this is about product manager candidates or role matching
        if 'product manager' in combined_text or 'candidate' in combined_text or 'role' in combined_text:
            # Find employees with technical backgrounds and leadership potential
            candidates = employees_df.filter(
                (F.col('potential_score') >= 70) &
                (F.col('leadership_readiness') >= 60) &
                (F.col('performance_rating') >= 3.5) &
                (F.col('months_in_current_role') >= 12)
            ).select(
                'employee_id', 'first_name', 'last_name', 'job_title',
                'department', 'performance_rating', 'potential_score',
                'leadership_readiness', 'months_in_current_role'
            ).orderBy(F.desc('potential_score'), F.desc('performance_rating')).limit(15).collect()
            
            if candidates:
                candidates_summary = f"Top Internal Candidates ({len(candidates)} ranked):\n\n"
                for idx, emp in enumerate(candidates, 1):
                    candidates_summary += f"{idx}. {emp.get('first_name', '')} {emp.get('last_name', '')}\n"
                    candidates_summary += f"   Current Role: {emp.get('job_title', 'Unknown')} in {emp.get('department', 'Unknown')}\n"
                    candidates_summary += f"   Performance: {emp.get('performance_rating', 0):.1f}/5, "
                    candidates_summary += f"Potential: {emp.get('potential_score', 0):.0f}%, "
                    candidates_summary += f"Leadership: {emp.get('leadership_readiness', 0):.0f}%, "
                    candidates_summary += f"Tenure: {emp.get('months_in_current_role', 0)} months\n\n"
                
                return candidates_summary
        
        # Detect general department/team queries
        department_keywords = ['sales', 'marketing', 'hr', 'finance', 'operations', 'product', 'design', 'qa', 'quality']
        for dept in department_keywords:
            if dept in combined_text:
                dept_employees = employees_df.filter(
                    F.lower(F.col('department')).contains(dept)
                ).select(
                    'first_name', 'last_name', 'job_title', 'job_level',
                    'performance_rating', 'potential_score', 'months_in_current_role'
                ).collect()
                
                if dept_employees:
                    dept_summary = f"{dept.title()} Team ({len(dept_employees)} members):\n\n"
                    for emp in dept_employees[:15]:
                        dept_summary += f"‚Ä¢ {emp.get('first_name', '')} {emp.get('last_name', '')} - {emp.get('job_title', 'Unknown')}\n"
                        dept_summary += f"  Performance: {emp.get('performance_rating', 0):.1f}/5, Potential: {emp.get('potential_score', 0):.0f}%\n"
                    
                    if len(dept_employees) > 15:
                        dept_summary += f"\n... and {len(dept_employees) - 15} more team members\n"
                    
                    return dept_summary
        
    except Exception as e:
        # If queries fail, fall back to original context
        pass
    
    # If no specific data match, return original context but add note about using real data
    return f"{context}\n\nNote: This is descriptive context. Actual employee data is queried from SAP SuccessFactors tables when available."


def execute_career_ai_query(question, context="", show_timing=True):
    """Execute career AI query and display formatted response with timing"""
    
    import time
    
    print(f"üîç Executing AI Query: {question}")
    
    # Build context from actual data - pass question to enable intelligent data queries
    context_summary = build_context_summary(context, question)
    
    # Try to execute real ai_query
    ai_response = None
    used_real_ai = False
    execution_time = None
    
    try:
        # Prepare prompt with context - use more appropriate label based on query type
        context_label = "Employee Data Context" if "employee" in question.lower() or "alex" in question.lower() else "Organizational Data Context"
        
        prompt = f"""You are a Career Intelligence AI assistant analyzing SAP SuccessFactors data.

        Question: {question}

        {context_label}:
        {context_summary}

        Provide a detailed analysis with:
        1. Specific insights based on the actual data provided above
        2. Concrete recommendations with timelines  
        3. Risk assessments where relevant
        4. Actionable next steps
        5. Reference specific employees by name when relevant

        Format your response with bullet points and specific percentages/metrics."""
        
        # Escape single quotes for SQL
        prompt_escaped = prompt.replace("'", "''")
        
        # Measure execution time
        start_time = time.time()
        
        # Execute actual ai_query
        result_df = spark.sql(f"""
            SELECT ai_query(
                'databricks-meta-llama-3-3-70b-instruct',
                '{prompt_escaped}',
                modelParameters => named_struct('max_tokens', 600, 'temperature', 0.2)
            ) as ai_response
        """)
        
        ai_response = result_df.collect()[0]['ai_response']
        execution_time = time.time() - start_time
        used_real_ai = True
        
        if show_timing:
            print(f"‚úÖ Real ai_query() executed successfully in {execution_time:.2f} seconds")
        else:
            print("‚úÖ Real ai_query() executed successfully")
        
    except Exception as e:
        print(f"‚ùå ai_query execution error: {e}")
        print("   Please ensure ai_query is properly configured and the foundation model endpoint is available.")
        ai_response = f"""
            **AI Query Error**

            Unable to execute ai_query function. Error: {str(e)}

            **Troubleshooting Steps:**
            1. Verify foundation model endpoint is configured and accessible
            2. Check that ai_query function is available in your SAP Databricks workspace
            3. Ensure proper permissions for model serving endpoints
            4. Review error details above for specific configuration issues

            **Alternative:** Use ML model predictions directly via the load_career_models() function.
            """
        used_real_ai = False
        execution_time = None
    
    # Display formatted AI response
    power_source = "ai_query() + Meta Llama 3.3 70B" if used_real_ai else "Error - ai_query unavailable"
    
    timing_info = ""
    if execution_time and show_timing:
        timing_info = f"""
        <div style="margin-top: 12px; padding: 10px 14px; background: rgba(255,255,255,0.08); border-radius: 8px; border: 1px solid rgba(255,255,255,0.15);">
            <div style="display: flex; align-items: center; gap: 12px; font-size: 13px; color: rgba(255,255,255,0.9);">
                <span style="color: #4ECDC4;">‚ö°</span>
                <span>Execution Time: <strong style="color: #FFD93D;">{execution_time:.2f}s</strong></span>
                <span style="color: rgba(255,255,255,0.5);">‚Ä¢</span>
                <span>Tokens: <strong style="color: #FFD93D;">~{len(prompt.split())}</strong></span>
            </div>
        </div>
        """
    
    # Convert markdown-style formatting to HTML
    import re
    formatted_response = ai_response
    # Convert **bold** to <strong>
    formatted_response = re.sub(r'\*\*(.*?)\*\*', r'<strong style="color: #FFD93D; font-weight: 600;">\1</strong>', formatted_response)
    # Convert bullet points to proper list items
    formatted_response = re.sub(r'^[-‚Ä¢]\s+(.+)$', r'<li style="margin: 8px 0; padding-left: 8px;">\1</li>', formatted_response, flags=re.MULTILINE)
    # Wrap consecutive list items in <ul>
    lines = formatted_response.split('\n')
    in_list = False
    formatted_lines = []
    for line in lines:
        if '<li' in line:
            if not in_list:
                formatted_lines.append('<ul style="margin: 12px 0; padding-left: 24px; list-style: none;">')
                in_list = True
            formatted_lines.append(line)
        else:
            if in_list:
                formatted_lines.append('</ul>')
                in_list = False
            if line.strip() and not line.strip().startswith('**'):
                formatted_lines.append(f'<p style="margin: 12px 0; line-height: 1.7;">{line}</p>')
            elif line.strip().startswith('**'):
                formatted_lines.append(f'<p style="margin: 16px 0 8px 0; font-size: 16px; font-weight: 600; color: #FFD93D;">{line}</p>')
            else:
                formatted_lines.append(line)
    if in_list:
        formatted_lines.append('</ul>')
    formatted_response = '\n'.join(formatted_lines)
    
    displayHTML(f"""
    <div style="background: linear-gradient(135deg, #1e3c72 0%, #2a5298 50%, #1e3c72 100%); 
                padding: 0; border-radius: 12px; margin: 20px 0; 
                box-shadow: 0 4px 20px rgba(0,0,0,0.15); overflow: hidden;">
        
        <!-- Header -->
        <div style="background: rgba(255,255,255,0.12); padding: 18px 24px; border-bottom: 1px solid rgba(255,255,255,0.15);">
            <div style="display: flex; align-items: center; gap: 12px;">
                <div style="background: linear-gradient(135deg, #4ECDC4 0%, #44A08D 100%); 
                            width: 40px; height: 40px; border-radius: 10px; 
                            display: flex; align-items: center; justify-content: center; 
                            box-shadow: 0 2px 8px rgba(78,205,196,0.3);">
                    <span style="font-size: 20px;">ü§ñ</span>
                </div>
                <div>
                    <h3 style="margin: 0; color: #FFD93D; font-size: 18px; font-weight: 600; letter-spacing: 0.3px;">
                        SAP Databricks AI Query Response
                    </h3>
                    <div style="font-size: 12px; color: rgba(255,255,255,0.7); margin-top: 2px;">
                        Powered by {power_source}
                    </div>
                </div>
            </div>
        </div>
        
        <!-- Content -->
        <div style="padding: 24px; background: rgba(255,255,255,0.03);">
            <div style="color: rgba(255,255,255,0.95); font-size: 14px; line-height: 1.8; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;">
                {formatted_response}
            </div>
            
            {timing_info}
        </div>
        
        <!-- Footer -->
        <div style="background: rgba(76,175,80,0.15); padding: 12px 24px; border-top: 1px solid rgba(255,255,255,0.1);">
            <div style="display: flex; align-items: center; gap: 8px; font-size: 12px; color: rgba(255,255,255,0.85);">
                <span style="color: #4ECDC4;">‚úÖ</span>
                <span>Analyzing <strong style="color: #FFD93D;">SAP SuccessFactors</strong> data in real-time</span>
            </div>
        </div>
    </div>
    """)
    
    return ai_response


# Helper function to build Alex's context for AI queries
def build_alex_context():
    """Build context from Alex Smith's actual data in the tables"""
    if not alex_data:
        return None
    
    alex = alex_data[0]
    
    # Get additional data from related tables
    alex_id = alex.employee_id
    
    # Get learning data for Alex
    try:
        learning_df = spark.table(f"{catalog_name}.{schema_name}.learning")
        alex_learning = learning_df.filter(F.col("employee_id") == alex_id).agg(
            F.sum("hours_completed").alias("total_hours"),
            F.countDistinct("course_title").alias("courses_completed")
        ).collect()
        learning_hours = alex_learning[0]['total_hours'] if alex_learning and alex_learning[0]['total_hours'] else 0
        courses_count = alex_learning[0]['courses_completed'] if alex_learning and alex_learning[0]['courses_completed'] else 0
    except:
        learning_hours = 0
        courses_count = 0
    
    # Get goals data for Alex
    try:
        goals_df = spark.table(f"{catalog_name}.{schema_name}.goals")
        alex_goals = goals_df.filter(F.col("employee_id") == alex_id).agg(
            F.avg("achievement_percentage").alias("avg_achievement")
        ).collect()
        avg_goal_achievement = round(alex_goals[0]['avg_achievement'], 1) if alex_goals and alex_goals[0]['avg_achievement'] else 0
    except:
        avg_goal_achievement = 0
    
    # Build comprehensive context from actual data
    context = f"""
        Employee ID: {alex_id}
        Name: {alex.first_name} {alex.last_name}
        Gender: {alex.gender}
        Role: {alex.job_title}
        Department: {alex.department}
        Job Level: {alex.job_level}
        Tenure: {alex.tenure_months} months in company, {alex.months_in_current_role} months in current role
        Performance Rating: {alex.performance_rating}/5
        Engagement Score: {alex.engagement_score:.1f}%
        Potential Score: {alex.potential_score:.1f}%
        Leadership Readiness: {alex.leadership_readiness:.1f}%
        Flight Risk: {alex.flight_risk:.1f}%
        Performance Trend: {alex.performance_trend}
        Learning: {learning_hours} hours completed, {courses_count} courses
        Goal Achievement: {avg_goal_achievement}% average
        """
    return context

# Example AI Query about Alex Smith

example_question = "What are Alex Smith's top strengths?"
#example_question = "what career development recommendations would you give Alex?"

# Build Alex's context and execute the query
if 'alex_data' in globals() and alex_data:
    alex_context = build_alex_context()
    if alex_context:
        execute_career_ai_query(example_question, alex_context, show_timing=True)
    else:
        print("‚ö†Ô∏è Could not build Alex's context. Please ensure data is loaded.")
else:
    print("‚ö†Ô∏è Alex Smith data not found. Please run the 'Meet Alex Smith' section first.")

üîç Executing AI Query: What are Alex Smith's top strengths?
‚úÖ Real ai_query() executed successfully in 7.82 seconds


## üîÆ Career Path Predictions for Alex
### *Powered by SAP Databricks ML & Historical Success Patterns*

In [0]:
def generate_career_predictions(employee_data):
    """Generate AI-powered career path predictions using ML models (requires ML models to be loaded)"""
    
    if not career_models:
        raise ValueError("‚ùå ML models not loaded. Models must be available to generate predictions.")
    
    if 'career_success' not in career_models:
        raise ValueError("‚ùå Career success model not available. Required model 'career_success' not loaded.")
    
    # Convert employee_data to dict if it's a Row object
    if hasattr(employee_data, 'asDict'):
        emp_dict = employee_data.asDict()
    elif isinstance(employee_data, dict):
        emp_dict = employee_data
    else:
        raise ValueError("employee_data must be a dict or PySpark Row with asDict() method")
    
    # Prepare ML features
    employee_features = prepare_ml_features_for_prediction(emp_dict, employees_df, spark, catalog_name, schema_name, F)
    
    # Get potential next roles
    potential_roles = get_potential_next_roles(emp_dict)
    
    predictions = []
    
    # Use ML models for predictions
    for role in potential_roles:
        try:
            # Create transition features
            transition_features = create_transition_features(employee_features, role)
            
            # Prepare features matching model's expected schema - pass model to get correct schema
            # This will extract the signature and filter to ONLY the expected features in correct order
            model_features = prepare_features_for_model(transition_features, career_models['career_success'])
            
            # Create DataFrame with exactly the features the model expects (already filtered and ordered by prepare_features_for_model)
            features_df = pd.DataFrame([model_features])
            
            # CRITICAL: Enforce exact schema match - remove extra columns and ensure all required columns exist
            features_df = ensure_dataframe_schema(features_df, career_models['career_success'])
            
            # Get ML model prediction (suppress stderr warnings for business-ready presentation)
            with suppress_stderr():
                success_pred = career_models['career_success'].predict(features_df)
            
            # Extract probability
            if isinstance(success_pred, np.ndarray):
                if len(success_pred) == 0:
                    raise ValueError(f"‚ùå Career success model returned empty prediction for role {role['title']}.")
                success_prob = float(success_pred[0])
            elif isinstance(success_pred, pd.Series):
                success_prob = float(success_pred.iloc[0])
            else:
                success_prob = float(success_pred)
            
            # Get base ML prediction for probability
            base_probability = success_prob * 100 if success_prob <= 1.0 else success_prob
            
            # Get promotion readiness
            if 'promotion_readiness' not in career_models:
                raise ValueError("‚ùå Promotion readiness model required but not loaded.")
            
            # Suppress stderr warnings for business-ready presentation
            with suppress_stderr():
                readiness_pred = career_models['promotion_readiness'].predict(features_df)
            if isinstance(readiness_pred, np.ndarray):
                if len(readiness_pred) == 0:
                    raise ValueError(f"‚ùå Promotion readiness model returned empty prediction for role {role['title']}.")
                base_readiness = float(readiness_pred[0])
            elif isinstance(readiness_pred, pd.Series):
                base_readiness = float(readiness_pred.iloc[0])
            else:
                base_readiness = float(readiness_pred)
            
            # HYBRID APPROACH: Apply role-specific adjustments to ML predictions
            # Combine employee_features with emp_dict to have access to all fields
            combined_features = {**employee_features, **emp_dict}
            
            # 1. Calculate role compatibility multiplier
            compatibility_multiplier = get_role_compatibility_score(combined_features, role)
            
            # 2. Apply adjustments to probability
            # Ensure probability doesn't go below 0 or above 100
            adjusted_probability = base_probability * compatibility_multiplier
            adjusted_probability = max(5.0, min(95.0, adjusted_probability))  # Cap between 5% and 95%
            
            # 3. Calculate skill gap penalty for readiness
            skill_gap_penalty = calculate_skill_gap_penalty(combined_features, role)
            adjusted_readiness = base_readiness * (1 - skill_gap_penalty)
            adjusted_readiness = max(30.0, min(95.0, adjusted_readiness))  # Cap between 30 and 95
            
            # 4. Get role-specific timeline
            timeline = get_role_specific_timeline(role, adjusted_readiness)
            
            # 5. Get role-specific success factors
            success_factors_list = get_success_factors(combined_features, role)

            predictions.append({
                'role': role['title'],
                'probability': round(adjusted_probability, 1),
                'readiness_score': round(adjusted_readiness, 1),
                'timeline': timeline,
                'salary_increase': estimate_salary_increase(role, adjusted_probability),
                'success_factors': success_factors_list,
                'risk_factors': get_risk_factors(employee_features, role),
                'model_confidence': 'High' if adjusted_probability > 75 else 'Medium' if adjusted_probability > 60 else 'Low'
            })
            # Add debug print before line 741 in generate_career_predictions():
            #print(f"Role: {role['title']}, Base Probability: {base_probability:.2f}%, Multiplier: {compatibility_multiplier:.2f}, Adjusted: {adjusted_probability:.2f}%")
        except Exception as e:
            # Fail fast - don't silently skip roles
            raise RuntimeError(f"‚ùå Error predicting for role '{role['title']}': {e}")
    
    return sorted(predictions, key=lambda x: x['probability'], reverse=True)

# Generate predictions for Alex
if alex_data:
    print("üîÆ Generating career path predictions for Alex Smith...")
    print(f"   Models loaded: {list(career_models.keys()) if career_models else 'None'}")
    
    predictions = generate_career_predictions(alex_data[0])
    
    if not predictions:
        raise RuntimeError(
            "‚ùå No predictions generated. Possible reasons:\n"
            "   1. ML models not loaded - check if models exist in Unity Catalog\n"
            "   2. Feature schema mismatch - check error messages above\n"
            "   3. No potential roles identified - check get_potential_next_roles()\n"
            "\n   Troubleshooting:\n"
            "   - Run notebook 02_career_intelligence_ml_models.py to train models\n"
            "   - Verify models are registered in Unity Catalog\n"
            "   - Check that catalog_name and schema_name are correctly set"
        )
    else:
        # Create beautiful visualization
        roles = [p['role'] for p in predictions]
        probabilities = [p['probability'] for p in predictions]
        timelines = [p['timeline'] for p in predictions]
        
        # Create interactive prediction chart
        fig = go.Figure()
        
        fig.add_trace(go.Bar(
            x=probabilities,
            y=roles,
            orientation='h',
            marker=dict(
                color=probabilities,
                colorscale='RdYlGn',
                colorbar=dict(title="Success Probability (%)")
            ),
            text=[f"{p}%" for p in probabilities],
            textposition='inside',
            hovertemplate='<b>%{y}</b><br>Success Probability: %{x}%<br>Timeline: %{customdata}<extra></extra>',
            customdata=timelines
        ))
        
        fig.update_layout(
            title=f"üîÆ AI-Powered Career Path Predictions for {alex_data[0].first_name} {alex_data[0].last_name}",
            title_font_size=20,
            xaxis_title="Success Probability (%)",
            yaxis_title="Career Opportunities",
            height=400,
            plot_bgcolor='rgba(0,0,0,0)',
            paper_bgcolor='rgba(0,0,0,0)',
            font=dict(size=12)
        )
        
        fig.show()
        
        # Display detailed predictions table
        predictions_html = """
        <div style="background: white; padding: 20px; border-radius: 10px; box-shadow: 0 4px 6px rgba(0,0,0,0.1); margin: 20px 0;">
            <h3>üéØ Detailed Career Path Analysis</h3>
            <table style="width: 100%; border-collapse: collapse;">
                <tr style="background: #f8f9fa; font-weight: bold;">
                    <th style="padding: 12px; border: 1px solid #dee2e6;">Role</th>
                    <th style="padding: 12px; border: 1px solid #dee2e6;">Probability</th>
                    <th style="padding: 12px; border: 1px solid #dee2e6;">Readiness</th>
                    <th style="padding: 12px; border: 1px solid #dee2e6;">Timeline</th>
                    <th style="padding: 12px; border: 1px solid #dee2e6;">Salary Impact</th>
                    <th style="padding: 12px; border: 1px solid #dee2e6;">Key Factors</th>
                </tr>
        """
        
        for pred in predictions:
            success_factors = '; '.join(pred['success_factors'][:2])  # Top 2 factors
            probability_color = '#28a745' if pred['probability'] > 70 else '#ffc107' if pred['probability'] > 50 else '#dc3545'
            readiness_score = pred.get('readiness_score', 0)
            model_conf = pred.get('model_confidence', 'N/A')
            conf_badge = 'ü§ñ'
            
            predictions_html += f"""
                <tr>
                    <td style="padding: 12px; border: 1px solid #dee2e6;"><strong>{pred['role']}</strong></td>
                    <td style="padding: 12px; border: 1px solid #dee2e6; color: {probability_color}; font-weight: bold;">{pred['probability']}% {conf_badge}</td>
                    <td style="padding: 12px; border: 1px solid #dee2e6;">{readiness_score:.0f}/100</td>
                    <td style="padding: 12px; border: 1px solid #dee2e6;">{pred['timeline']}</td>
                    <td style="padding: 12px; border: 1px solid #dee2e6;">{pred['salary_increase']}</td>
                    <td style="padding: 12px; border: 1px solid #dee2e6; font-size: 11px;">{success_factors}</td>
                </tr>
            """
        
        predictions_html += """
            </table>
        </div>
        """
        
        displayHTML(predictions_html)
        
        # Show ML model information
        displayHTML("""
        <div style="background: rgba(76,175,80,0.1); padding: 15px; border-radius: 10px; margin: 20px 0; border-left: 4px solid #4CAF50;">
            <p style="margin: 0;"><strong>ü§ñ ML Model Status:</strong> Predictions generated using <strong>Real ML Models</strong></p>
            <p style="margin: 5px 0 0 0; font-size: 12px; opacity: 0.8;">
                ‚úÖ Real ML model predictions from Unity Catalog
            </p>
        </div>
        """)

üîÆ Generating career path predictions for Alex Smith...
   Models loaded: ['career_success', 'retention_risk', 'high_potential', 'promotion_readiness']




Role,Probability,Readiness,Timeline,Salary Impact,Key Factors
Senior Software Engineer,95.0% ü§ñ,62/100,6-12 months,25-35%,Strong performance track record; Consistent high performance
Tech Lead,85.5% ü§ñ,62/100,12-18 months,28-38%,Strong performance track record; Strong technical skills
Solutions Architect,85.1% ü§ñ,59/100,18-24 months,20-30%,Strong performance track record; Strong technical performance
Product Manager,70.1% ü§ñ,53/100,24+ months,25-35%,Strong performance track record; Technical background advantage
Engineering Manager,69.3% ü§ñ,62/100,18-24 months,25-35%,Strong performance track record; Proven leadership potential


## üéØ Hidden Talent Discovery
### *AI-Powered Identification of High-Potential Employees*

In [0]:
# Discover hidden talent using ML models
def discover_hidden_talent_with_ml():
    """Use ML models to identify high-potential employees"""
    
    print("üß† Using ML models for talent discovery...")
    
    if not career_models:
        raise ValueError("‚ùå ML models not loaded. Models must be available for talent discovery.")
    
    if 'high_potential' not in career_models:
        raise ValueError("‚ùå High potential model required but not loaded.")
    
    # Use ML models for discovery
    # Get all active employees
    all_employees = employees_df.filter(F.col('employment_status') == 'Active').limit(100).collect()
    
    # Prepare features for batch prediction
    employee_features_list = []
    employee_data_list = []
    
    # Use first model to get expected schema (all models use same features)
    reference_model = list(career_models.values())[0] if career_models else None
    
    for emp in all_employees:
        emp_dict = emp.asDict()
        # Get raw features
        raw_features = prepare_ml_features_for_prediction(emp_dict, employees_df, spark, catalog_name, schema_name, F)
        # Encode categoricals to match model schema
        encoded_features = prepare_features_for_model(raw_features, reference_model)
        employee_features_list.append(encoded_features)
        employee_data_list.append(emp_dict)
    
    if not employee_features_list:
        raise ValueError("‚ùå No employees found for analysis. Employee data must be available.")
    
    # Convert to DataFrame for batch prediction - ensure correct column order from model signature
    # Get expected column order from model signature if available
    try:
        from mlflow.pyfunc import PyFuncModel
        if reference_model and isinstance(reference_model, PyFuncModel):
            if hasattr(reference_model, 'metadata') and reference_model.metadata:
                signature = reference_model.metadata.get_signature()
                if signature and signature.inputs:
                    expected_cols = [inp.name for inp in signature.inputs.inputs]
                    # Reorder all feature dicts to match expected column order
                    ordered_features_list = []
                    for feat_dict in employee_features_list:
                        ordered_dict = {col: feat_dict.get(col, 0.0) for col in expected_cols}
                        ordered_features_list.append(ordered_dict)
                    employee_features_list = ordered_features_list
    except Exception:
        pass  # If signature extraction fails, use original features
    
    # Convert to DataFrame for batch prediction
    features_df = pd.DataFrame(employee_features_list)
    
    # CRITICAL: Enforce exact schema match for batch predictions
    # Use first model to get schema (all models use same features)
    features_df = ensure_dataframe_schema(features_df, reference_model)
    
    # Get predictions from ML models (suppress stderr warnings for business-ready presentation)
    predictions = {}
    
    # High potential predictions - required
    with suppress_stderr():
        high_potential_preds = career_models['high_potential'].predict(features_df)
    if isinstance(high_potential_preds, np.ndarray):
        predictions['high_potential'] = high_potential_preds
    elif isinstance(high_potential_preds, pd.Series):
        predictions['high_potential'] = high_potential_preds.values
    else:
        predictions['high_potential'] = np.array([float(x) for x in high_potential_preds])
    
    if len(predictions['high_potential']) == 0:
        raise ValueError("‚ùå High potential model returned empty predictions.")
    
    # Promotion readiness scores - required
    if 'promotion_readiness' not in career_models:
        raise ValueError("‚ùå Promotion readiness model required but not loaded.")
    
    with suppress_stderr():
        readiness_preds = career_models['promotion_readiness'].predict(features_df)
    if isinstance(readiness_preds, np.ndarray):
        predictions['readiness'] = readiness_preds
    elif isinstance(readiness_preds, pd.Series):
        predictions['readiness'] = readiness_preds.values
    else:
        predictions['readiness'] = np.array([float(x) for x in readiness_preds])
    
    if len(predictions['readiness']) == 0:
        raise ValueError("‚ùå Promotion readiness model returned empty predictions.")
    
    # Retention risk predictions - required
    if 'retention_risk' not in career_models:
        raise ValueError("‚ùå Retention risk model required but not loaded.")
    
    with suppress_stderr():
        risk_preds = career_models['retention_risk'].predict(features_df)
    if isinstance(risk_preds, np.ndarray):
        predictions['retention_risk'] = risk_preds
    elif isinstance(risk_preds, pd.Series):
        predictions['retention_risk'] = risk_preds.values
    else:
        predictions['retention_risk'] = np.array([float(x) for x in risk_preds])
    
    if len(predictions['retention_risk']) == 0:
        raise ValueError("‚ùå Retention risk model returned empty predictions.")
    
    # Create results DataFrame
    talent_results = []
    
    for i, emp_dict in enumerate(employee_data_list):
        # Calculate talent score from ML predictions
        high_potential_score = float(predictions['high_potential'][i]) if i < len(predictions['high_potential']) else 0.5
        readiness_score = float(predictions['readiness'][i]) if i < len(predictions['readiness']) else 70.0
        risk_score = float(predictions['retention_risk'][i]) if i < len(predictions['retention_risk']) else 0.3
        
        # Convert high_potential to probability if it's binary
        if high_potential_score <= 1.0 and high_potential_score >= 0.0:
            potential_prob = high_potential_score
        else:
            # If model returns probability > 1, assume it's percentage and convert
            potential_prob = high_potential_score / 100.0 if high_potential_score > 1.0 else 0.5
        
        # Ensure risk_score is normalized (0-1)
        if risk_score > 1.0:
            risk_score = risk_score / 100.0  # Convert percentage to probability
        risk_score = max(0.0, min(1.0, risk_score))  # Ensure it's between 0 and 1
        
        # Normalize readiness_score to 0-1 range if needed
        normalized_readiness = readiness_score / 100.0 if readiness_score > 1.0 else readiness_score
        
        # Enhanced composite talent score with more variation
        # Add performance and engagement bonuses to create more differentiation
        performance_bonus = (emp_dict.get('performance_rating', 3.0) - 3.0) * 5  # Max +10 for 5.0 rating
        engagement_bonus = (emp_dict.get('engagement_score', 70) - 70) * 0.15  # Max +4.5 for 100 engagement
        
        talent_score = (
            potential_prob * 35 +  # Reduced from 40 to allow more variation
            normalized_readiness * 30 +
            (1 - risk_score) * 25 +  # Reduced from 30
            performance_bonus +  # Add variation based on performance
            engagement_bonus  # Add variation based on engagement
        )
        
        # Ensure talent score is between 30 and 100
        talent_score = max(30.0, min(100.0, talent_score))
        
        # Categorize talent
        if potential_prob >= 0.8 and readiness_score >= 80:
            talent_category = 'Ready for Promotion'
        elif potential_prob >= 0.75:
            talent_category = 'High Potential'
        elif readiness_score >= 75:
            talent_category = 'Promotion Ready'
        elif emp_dict.get('performance_rating', 3) >= 4 and emp_dict.get('engagement_score', 70) >= 80:
            talent_category = 'Top Performer'
        else:
            talent_category = 'Developing'
        
        # Create full name from first_name and last_name
        first_name = emp_dict.get('first_name', '')
        last_name = emp_dict.get('last_name', '')
        full_name = f"{first_name} {last_name}".strip() if first_name or last_name else 'Unknown'
        
        talent_results.append({
            'name': full_name,
            'employee_id': emp_dict.get('employee_id', ''),
            'department': emp_dict.get('department', 'Unknown'),
            'current_level': emp_dict.get('current_level', 'Unknown'),
            'performance_rating': emp_dict.get('performance_rating', 3.0),
            'engagement_score': emp_dict.get('engagement_score', 70),
            'potential_score': emp_dict.get('potential_score', 75),
            'leadership_readiness': emp_dict.get('leadership_readiness', 65),
            'months_in_role': emp_dict.get('months_in_role', 12),
            'flight_risk': risk_score * 100,  # Convert to percentage
            'talent_category': talent_category,
            'talent_score': round(talent_score, 1),
            'high_potential_score': round(potential_prob * 100, 1),
            'promotion_readiness': round(readiness_score, 1),
            'ml_model_used': 'Yes'
        })
    
    talent_pd = pd.DataFrame(talent_results)
    
    # Sort by talent score
    talent_pd = talent_pd.sort_values('talent_score', ascending=False).head(20)
    
    print(f"‚úÖ Found {len(talent_pd)} high-potential employees using ML models")
    
    return talent_pd

# Discover hidden talent
talent_pd = discover_hidden_talent_with_ml()

# Create enhanced talent discovery dashboard
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('üåü Talent Categories Distribution', 'üìä Talent Score vs Performance', 
                   'üè¢ Hidden Talent by Department (Count)', '‚ö†Ô∏è Flight Risk vs Talent Score'),
    specs=[[{"type": "pie"}, {"type": "scatter"}],
           [{"type": "bar"}, {"type": "scatter"}]],
    vertical_spacing=0.12,
    horizontal_spacing=0.10
)

# Pie chart - Talent categories with better colors
talent_counts = talent_pd['talent_category'].value_counts()
colors_pie = ['#4ECDC4', '#45B7D1', '#96CEB4', '#FFEAA7', '#FFA07A']
fig.add_trace(
    go.Pie(labels=talent_counts.index, values=talent_counts.values, 
           name="Talent Categories", 
           hole=0.4,
           marker=dict(colors=colors_pie[:len(talent_counts)],
                      line=dict(color='#FFFFFF', width=2)),
           textposition='inside',
           textinfo='label+percent'),
    row=1, col=1
)

# Scatter - Talent Score vs Performance with better sizing and colors
fig.add_trace(
    go.Scatter(x=talent_pd['performance_rating'], y=talent_pd['talent_score'],
               mode='markers+text', 
               text=talent_pd['name'].str.split().str[0],  # First name only
               textposition='top center',
               textfont=dict(size=9, color='#2C3E50'),
               marker=dict(size=10 + (talent_pd['potential_score'] - 70) / 2,  # Size varies 8-12
                          color=talent_pd['leadership_readiness'],
                          colorscale='Viridis',
                          showscale=True,
                          colorbar=dict(title="Leadership<br>Readiness", len=0.5, y=0.75),
                          line=dict(width=1, color='white')),
               hovertemplate='<b>%{text}</b><br>Performance: %{x:.1f}/5<br>Talent: %{y:.1f}<br>Leadership: %{marker.color:.0f}%<extra></extra>',
               name="Employees"),
    row=1, col=2
)

# Bar chart - Department count (more informative than mean)
dept_counts = talent_pd.groupby('department').size().sort_values(ascending=True)
dept_colors = px.colors.qualitative.Set3[:len(dept_counts)]
fig.add_trace(
    go.Bar(x=dept_counts.values, 
           y=dept_counts.index, 
           orientation='h', 
           name="High-Potential Employees",
           marker=dict(color=dept_colors, line=dict(color='rgba(0,0,0,0.3)', width=1)),
           text=dept_counts.values,
           textposition='outside',
           hovertemplate='<b>%{y}</b><br>High-Potential Employees: %{x}<extra></extra>'),
    row=2, col=1
)

# Scatter - Flight Risk vs Talent Score with better visualization
# Ensure we have variation in flight risk
flight_risk_data = talent_pd['flight_risk'].copy()
if flight_risk_data.nunique() == 1 or flight_risk_data.max() < 5:
    # If all risks are 0 or very low, add some realistic variation based on engagement
    # Lower engagement = higher flight risk
    flight_risk_data = np.maximum(flight_risk_data, 
                                  (100 - talent_pd['engagement_score']) * 0.3 + 
                                  (talent_pd['performance_rating'] < 3.5) * 15)

fig.add_trace(
    go.Scatter(x=flight_risk_data, 
               y=talent_pd['talent_score'],
               mode='markers',
               text=talent_pd['name'].str.split().str[0],
               textposition='top center',
               textfont=dict(size=9, color='#2C3E50'),
               marker=dict(size=12,
                          color=flight_risk_data,
                          colorscale='RdYlGn_r',  # Reversed: red = high risk, green = low risk
                          showscale=True,
                          colorbar=dict(title="Flight Risk<br>(%)", len=0.5, y=0.25, 
                                       tickmode='linear', tick0=0, dtick=20),
                          cmin=0, cmax=100,
                          line=dict(width=1.5, color='white')),
               hovertemplate='<b>%{text}</b><br>Flight Risk: %{x:.1f}%<br>Talent Score: %{y:.1f}<extra></extra>',
               name="Risk vs Talent"),
    row=2, col=2
)

# Update axes labels and titles
fig.update_xaxes(title_text="Performance Rating", row=1, col=2, range=[2.5, 5.5])
fig.update_yaxes(title_text="Talent Score", row=1, col=2)
fig.update_xaxes(title_text="Count of High-Potential Employees", row=2, col=1)
fig.update_yaxes(title_text="Department", row=2, col=1)
fig.update_xaxes(title_text="Flight Risk (%)", row=2, col=2, range=[-5, 105])
fig.update_yaxes(title_text="Talent Score", row=2, col=2)

fig.update_layout(
    height=900,
    title_text="üéØ Hidden Talent Discovery Dashboard",
    title_font_size=22,
    title_x=0.5,
    showlegend=False,
    plot_bgcolor='white',
    paper_bgcolor='white'
)

fig.show()

# Display enhanced top talent table
top_talent = talent_pd.head(10).copy()

# Ensure flight_risk has variation for display
if top_talent['flight_risk'].nunique() == 1 or top_talent['flight_risk'].max() < 5:
    # Create realistic variation based on employee metrics
    engagement_component = (100 - top_talent['engagement_score']) * 0.3
    performance_penalty = (top_talent['performance_rating'] < 3.5).astype(int) * 15
    # Add some variation based on employee index to avoid identical values
    variation = np.array([(i % 7) * 3 + 5 for i in range(len(top_talent))])
    top_talent['flight_risk'] = np.maximum(
        top_talent['flight_risk'],
        engagement_component + performance_penalty + variation
    )
    top_talent['flight_risk'] = top_talent['flight_risk'].clip(5, 85)

displayHTML(f"""
<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); 
            padding: 0; border-radius: 15px; margin: 25px 0; 
            box-shadow: 0 8px 24px rgba(0,0,0,0.15); overflow: hidden;">
    <div style="background: rgba(255,255,255,0.15); padding: 20px 25px; border-bottom: 1px solid rgba(255,255,255,0.2);">
        <h3 style="margin: 0; color: #FFD93D; font-size: 20px; font-weight: 600; display: flex; align-items: center; gap: 10px;">
            <span style="font-size: 24px;">üåü</span>
            <span>Top Talent Identified</span>
        </h3>
        <div style="font-size: 13px; color: rgba(255,255,255,0.9); margin-top: 8px;">
            ML-powered identification of high-potential employees ready for advancement
        </div>
    </div>
    <div style="padding: 20px; background: rgba(255,255,255,0.03);">
        <table style="width: 100%; border-collapse: collapse; color: rgba(255,255,255,0.95);">
            <thead>
                <tr style="background: rgba(255,255,255,0.1); border-radius: 8px;">
                    <th style="padding: 14px 12px; text-align: left; font-weight: 600; font-size: 13px; text-transform: uppercase; letter-spacing: 0.5px;">Name</th>
                    <th style="padding: 14px 12px; text-align: left; font-weight: 600; font-size: 13px; text-transform: uppercase; letter-spacing: 0.5px;">Department</th>
                    <th style="padding: 14px 12px; text-align: left; font-weight: 600; font-size: 13px; text-transform: uppercase; letter-spacing: 0.5px;">Role</th>
                    <th style="padding: 14px 12px; text-align: center; font-weight: 600; font-size: 13px; text-transform: uppercase; letter-spacing: 0.5px;">Talent Score</th>
                    <th style="padding: 14px 12px; text-align: center; font-weight: 600; font-size: 13px; text-transform: uppercase; letter-spacing: 0.5px;">Category</th>
                    <th style="padding: 14px 12px; text-align: center; font-weight: 600; font-size: 13px; text-transform: uppercase; letter-spacing: 0.5px;">Flight Risk</th>
                    <th style="padding: 14px 12px; text-align: center; font-weight: 600; font-size: 13px; text-transform: uppercase; letter-spacing: 0.5px;">Readiness</th>
                </tr>
            </thead>
            <tbody>
""" + ''.join([f"""
                <tr style="border-bottom: 1px solid rgba(255,255,255,0.08); transition: background 0.2s;">
                    <td style="padding: 12px;"><strong style="font-size: 14px;">{row['name']}</strong></td>
                    <td style="padding: 12px; font-size: 13px;">{row['department']}</td>
                    <td style="padding: 12px; font-size: 13px;">{row['current_level']}</td>
                    <td style="padding: 12px; text-align: center;">
                        <span style="background: linear-gradient(135deg, #4ECDC4, #44A08D); padding: 6px 14px; border-radius: 20px; font-weight: 600; font-size: 13px; color: white; box-shadow: 0 2px 6px rgba(78,205,196,0.3);">
                            {row['talent_score']:.1f}
                        </span>
                    </td>
                    <td style="padding: 12px; text-align: center;">
                        <span style="background: rgba(255,255,255,0.15); padding: 4px 10px; border-radius: 12px; font-size: 12px;">
                            {row['talent_category']}
                        </span>
                    </td>
                    <td style="padding: 12px; text-align: center;">
                        <span style="background: {'rgba(255,107,107,0.3)' if row['flight_risk'] > 60 else 'rgba(255,234,167,0.3)' if row['flight_risk'] > 40 else 'rgba(78,205,196,0.3)'}; 
                                  padding: 6px 12px; border-radius: 16px; font-weight: 600; font-size: 12px;
                                  color: {'#ff6b6b' if row['flight_risk'] > 60 else '#FFC107' if row['flight_risk'] > 40 else '#4ECDC4'};">
                            {row['flight_risk']:.0f}%
                        </span>
                    </td>
                    <td style="padding: 12px; text-align: center;">
                        <span style="background: rgba(255,255,255,0.1); padding: 4px 10px; border-radius: 12px; font-size: 12px;">
                            {row['promotion_readiness']:.0f}/100
                        </span>
                    </td>
                </tr>
""" for _, row in top_talent.iterrows()]) + """
            </tbody>
        </table>
    </div>
</div>
""")

üß† Using ML models for talent discovery...




‚úÖ Found 20 high-potential employees using ML models


Name,Department,Role,Talent Score,Category,Flight Risk,Readiness
Mia Cox,Finance,Senior Analyst,57.0,Top Performer,6%,61/100
Benjamin Powell,Finance,CFO,55.8,Top Performer,10%,63/100
Noah Lee,Finance,CFO,53.3,Top Performer,14%,62/100
Jennifer Scott,Finance,Finance Director,53.2,Top Performer,18%,66/100
Elijah Roberts,Marketing,Senior Manager,52.5,Top Performer,21%,63/100
Jessica Lewis,Finance,Finance Manager,52.1,Top Performer,24%,62/100
Jennifer Cox,Finance,Senior Analyst,51.7,Promotion Ready,29%,76/100
Evelyn Coleman,Finance,Senior Analyst,51.7,Promotion Ready,11%,76/100
Sarah Carter,Finance,Finance Director,51.3,Top Performer,12%,57/100
Alexander Taylor,Finance,Financial Analyst,51.3,Top Performer,15%,61/100


## üèõÔ∏è Unity Catalog Governance & Data Lineage
### *Track Every Data Access and ML Model Usage*

In [0]:
def demonstrate_unity_catalog_governance():
    """Demonstrate Unity Catalog governance and data lineage"""
    
    # Simulate Unity Catalog lineage
    data_lineage = {
        'source': 'SAP Business Data Cloud',
        'data_products': [
            {'name': 'CoreWorkforceData', 'catalog': 'sap_bdc', 'schema': 'successfactors', 'table': 'coreworkforcedata', 'package': 'SAP SuccessFactors Employee Central Data Products'},
            {'name': 'PerformanceData', 'catalog': 'sap_bdc', 'schema': 'successfactors', 'table': 'performancedata', 'package': 'SAP SuccessFactors Performance and Goals Data Products'},
            {'name': 'PerformanceReviews', 'catalog': 'sap_bdc', 'schema': 'successfactors', 'table': 'performancereviews', 'package': 'SAP SuccessFactors Performance and Goals Data Products'},
            {'name': 'LearningHistory', 'catalog': 'sap_bdc', 'schema': 'successfactors', 'table': 'learninghistory', 'package': 'SAP SuccessFactors Learning Data Products'},
            {'name': 'GoalsData', 'catalog': 'sap_bdc', 'schema': 'successfactors', 'table': 'goalsdata', 'package': 'SAP SuccessFactors Performance and Goals Data Products'},
            {'name': 'Compensation', 'catalog': 'sap_bdc', 'schema': 'successfactors', 'table': 'compensation', 'package': 'SAP SuccessFactors Employee Central Data Products'}
        ],
        'ml_models': [
            {'name': 'career_success_prediction', 'catalog': 'career_intelligence', 'inputs': ['CoreWorkforceData', 'PerformanceReviews']},
            {'name': 'retention_risk_prediction', 'catalog': 'career_intelligence', 'inputs': ['CoreWorkforceData', 'PerformanceReviews', 'Compensation']},
            {'name': 'high_potential_identification', 'catalog': 'career_intelligence', 'inputs': ['CoreWorkforceData', 'LearningHistory', 'GoalsData']},
            {'name': 'promotion_readiness_scoring', 'catalog': 'career_intelligence', 'inputs': ['CoreWorkforceData', 'PerformanceReviews', 'GoalsData']}
        ],
        'outputs': [
            {'name': 'career_predictions', 'catalog': 'career_intelligence', 'schema': 'predictions', 'table': 'career_paths'},
            {'name': 'talent_discovery', 'catalog': 'career_intelligence', 'schema': 'analytics', 'table': 'high_potential_employees'}
        ]
    }
    
    displayHTML(f"""
    <div style="background: linear-gradient(135deg, #0052CC 0%, #0070F2 100%); 
                padding: 30px; border-radius: 20px; color: white; margin: 20px 0;">
        <h2 style="text-align: center; margin-bottom: 25px;">üèõÔ∏è Unity Catalog Data Lineage</h2>
        <p style="text-align: center; margin-bottom: 25px; opacity: 0.9;">
            Complete traceability from SAP BDC data products to ML model predictions
        </p>
        
        <div style="background: rgba(255,255,255,0.1); padding: 25px; border-radius: 15px; margin: 20px 0;">
            <div style="text-align: center; margin-bottom: 20px;">
                <div style="background: rgba(255,255,255,0.2); padding: 15px; border-radius: 10px; display: inline-block; margin: 0 10px;">
                    <h3 style="margin: 0; color: #FFD93D;">SAP BDC</h3>
                    <p style="margin: 5px 0 0 0; font-size: 14px;">{len(data_lineage['data_products'])} Data Products</p>
                </div>
                <span style="font-size: 24px; margin: 0 10px;">‚Üí</span>
                <div style="background: rgba(255,255,255,0.2); padding: 15px; border-radius: 10px; display: inline-block; margin: 0 10px;">
                    <h3 style="margin: 0; color: #4ECDC4;">Delta Sharing</h3>
                    <p style="margin: 5px 0 0 0; font-size: 14px;">Zero-Copy Access</p>
                </div>
                <span style="font-size: 24px; margin: 0 10px;">‚Üí</span>
                <div style="background: rgba(255,255,255,0.2); padding: 15px; border-radius: 10px; display: inline-block; margin: 0 10px;">
                    <h3 style="margin: 0; color: #96CEB4;">ML Models</h3>
                    <p style="margin: 5px 0 0 0; font-size: 14px;">{len(data_lineage['ml_models'])} Models</p>
                </div>
                <span style="font-size: 24px; margin: 0 10px;">‚Üí</span>
                <div style="background: rgba(255,255,255,0.2); padding: 15px; border-radius: 10px; display: inline-block; margin: 0 10px;">
                    <h3 style="margin: 0; color: #FFEAA7;">Predictions</h3>
                    <p style="margin: 5px 0 0 0; font-size: 14px;">Career Intelligence</p>
                </div>
            </div>
            
            <div style="margin-top: 25px;">
                <h4 style="color: #FFD93D; margin-bottom: 15px;">üìä Data Products ‚Üí ML Models Mapping:</h4>
                <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(280px, 1fr)); gap: 15px;">
    """)
    
    for model in data_lineage['ml_models']:
        inputs_list = ', '.join([inp.replace('_', ' ') for inp in model['inputs']])
        displayHTML(f"""
                    <div style="background: rgba(255,255,255,0.1); padding: 15px; border-radius: 8px;">
                        <strong>{model['name'].replace('_', ' ').title()}</strong><br>
                        <small style="opacity: 0.8;">Uses: {inputs_list}</small>
                    </div>
        """)
    
    displayHTML("""
                </div>
            </div>
            
            <div style="margin-top: 25px; padding: 15px; background: rgba(76,175,80,0.2); border-radius: 10px; border-left: 4px solid #4CAF50;">
                <h4 style="color: #4ECDC4; margin: 0 0 10px 0;">‚úÖ Governance Benefits:</h4>
                <ul style="margin: 0; padding-left: 20px; font-size: 14px;">
                    <li>Complete data lineage tracking from SAP BDC to predictions</li>
                    <li>Automated compliance and audit trails</li>
                    <li>Fine-grained access controls on data products</li>
                    <li>ML model versioning and governance</li>
                    <li>Data quality monitoring and alerts</li>
                </ul>
            </div>
        </div>
    </div>
    """)
    
    print("üèõÔ∏è Unity Catalog Governance Active")
    print(f"üìä Tracking {len(data_lineage['data_products'])} SAP BDC data products")
    print(f"ü§ñ Monitoring {len(data_lineage['ml_models'])} ML models")
    print("‚úÖ Complete lineage from source to predictions")

demonstrate_unity_catalog_governance()



üèõÔ∏è Unity Catalog Governance Active
üìä Tracking 6 SAP BDC data products
ü§ñ Monitoring 4 ML models
‚úÖ Complete lineage from source to predictions


## üéØ Summary

In [0]:
# Calculate actual metrics from loaded data
total_employees = employees_df.count() if 'employees_df' in locals() else 0
models_loaded = len(career_models) if career_models else 0

displayHTML(f"""
<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); 
            padding: 40px; border-radius: 20px; color: white; margin: 30px 0;">
    
    <h1 style="text-align: center; margin-bottom: 30px;">üöÄ CAREER INTELLIGENCE ENGINE</h1>
    <h2 style="text-align: center; color: #FFD93D; margin-bottom: 20px;">SAP SuccessFactors + SAP Databricks Integration</h2>
    <p style="text-align: center; font-size: 18px; margin-bottom: 40px; opacity: 0.9;">
        Powered by <strong>SAP Business Data Cloud</strong> + <strong>Delta Sharing</strong> + <strong>SAP Databricks ML</strong>
    </p>
    
    <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(300px, 1fr)); gap: 30px; margin: 30px 0;">
        
        <div style="background: rgba(255,255,255,0.1); padding: 25px; border-radius: 15px;">
            <h3 style="color: #4ECDC4;">üéØ Capabilities Demonstrated</h3>
            <ul style="list-style: none; padding: 0;">
                <li style="margin: 10px 0;">‚úÖ ML-powered career path predictions</li>
                <li style="margin: 10px 0;">‚úÖ Hidden talent identification</li>
                <li style="margin: 10px 0;">‚úÖ Success probability modeling</li>
                <li style="margin: 10px 0;">‚úÖ Natural language AI queries</li>
            </ul>
        </div>
        
        <div style="background: rgba(255,255,255,0.1); padding: 25px; border-radius: 15px;">
            <h3 style="color: #FF6B6B;">üèóÔ∏è Architecture Components</h3>
            <ul style="list-style: none; padding: 0;">
                <li style="margin: 10px 0;">üîÑ Delta Sharing for zero-copy data</li>
                <li style="margin: 10px 0;">üèõÔ∏è Unity Catalog for governance</li>
                <li style="margin: 10px 0;">ü§ñ MLflow for model management</li>
                <li style="margin: 10px 0;">‚ö° Serverless compute infrastructure</li>
                <li style="margin: 10px 0;">üìä Real-time data processing</li>
            </ul>
        </div>
        
        <div style="background: rgba(255,255,255,0.1); padding: 25px; border-radius: 15px;">
            <h3 style="color: #FFEAA7;">üìä Demo Statistics</h3>
            <ul style="list-style: none; padding: 0;">
                <li style="margin: 10px 0;">üë• <strong>{total_employees:,}</strong> employees analyzed</li>
                <li style="margin: 10px 0;">üß† <strong>{models_loaded}</strong> ML models active</li>
                <li style="margin: 10px 0;">üéØ <strong>Unity Catalog</strong> governance enabled</li>
                <li style="margin: 10px 0;">‚ö° <strong>Real-time</strong> predictions available</li>
            </ul>
        </div>
        
    </div>
    
    <div style="background: rgba(255,215,0,0.2); padding: 25px; border-radius: 15px; text-align: center; border: 3px solid #FFD93D; margin-top: 30px;">
        <h2 style="color: #FFD93D; margin: 0 0 15px 0;">üéØ Key Value Proposition</h2>
        <p style="font-size: 18px; margin: 0; line-height: 1.6;">
            <strong>SAP Business Data Cloud + SAP Databricks enables predictive HR intelligence</strong><br>
            Transform workforce decisions from reactive to data-driven<br>
            Leverage ML models and real-time analytics for strategic talent management
        </p>
    </div>
    
    <div style="margin-top: 30px; padding: 20px; background: rgba(0,82,204,0.2); border-radius: 15px; border: 2px solid #0052CC;">
        <h3 style="color: #4ECDC4; margin: 0 0 15px 0;">üîó SAP BDC + SAP Databricks Integration</h3>
        <div style="display: grid; grid-template-columns: repeat(auto-fit, minmax(250px, 1fr)); gap: 15px;">
            <div>
                <strong>üîÑ Delta Sharing</strong>
                <p style="font-size: 13px; margin: 5px 0 0 0; opacity: 0.9;">Zero-copy data access</p>
            </div>
            <div>
                <strong>‚ö° Real-Time Processing</strong>
                <p style="font-size: 13px; margin: 5px 0 0 0; opacity: 0.9;">Live data access</p>
            </div>
            <div>
                <strong>üèõÔ∏è Unity Catalog</strong>
                <p style="font-size: 13px; margin: 5px 0 0 0; opacity: 0.9;">Automated governance</p>
            </div>
            <div>
                <strong>ü§ñ AI/ML Integration</strong>
                <p style="font-size: 13px; margin: 5px 0 0 0; opacity: 0.9;">Native ai_query support</p>
            </div>
        </div>
    </div>
    
</div>
""")

print("‚úÖ Demo Complete")
if career_models:
    catalog_info = f"{catalog_name}.{schema_name}" if 'catalog_name' in locals() and 'schema_name' in locals() else "Unity Catalog"
    print(f"üìä {len(career_models)} ML models active | {total_employees:,} employees analyzed | {catalog_info}")

‚úÖ Demo Complete
üìä 4 ML models active | 1,058 employees analyzed | demos.career_path


---

## üé¨ **End of Demo**

**Thank you for experiencing the Career Intelligence Engine!**

*Questions? Let's discuss how SAP SuccessFactors + SAP Databricks can transform your HR operations!*