## üì¶ Section 1: Setup & Environment Verification

First, let's verify that your environment is properly configured.

In [62]:
# Import required libraries
import sys
import pandas as pd
import numpy as np
from pathlib import Path

# Check Python version
print(f"‚úì Python version: {sys.version.split()[0]}")

# Check package versions
import sklearn
import joblib
print(f"‚úì pandas: {pd.__version__}")
print(f"‚úì scikit-learn: {sklearn.__version__}")
print(f"‚úì joblib: {joblib.__version__}")

‚úì Python version: 3.12.1
‚úì pandas: 2.2.2
‚úì scikit-learn: 1.7.2
‚úì joblib: 1.4.2


In [63]:
# Verify project structure and files exist
project_root = Path.cwd().parent
models_dir = project_root / "models"
data_dir = project_root / "data"

# Check for required files
required_files = {
    "Vectorizer": models_dir / "vectorizer.joblib",
    "Model": models_dir / "baseline_logreg.joblib",  # Using baseline model (compatible)
    "Dataset": data_dir / "cyber_incidents_simulated.csv.gz"  # .gz compressed format
}

print("\nüìÇ File Verification:")
all_exist = True
for name, filepath in required_files.items():
    exists = filepath.exists()
    symbol = "‚úì" if exists else "‚úó"
    print(f"{symbol} {name}: {filepath.name}")
    if not exists:
        all_exist = False

if all_exist:
    print("\n‚úÖ All required files found! Ready to proceed.")
else:
    print("\n‚ö†Ô∏è Some files are missing. Please run: pytest tests/test_model_artifacts.py")


üìÇ File Verification:
‚úì Vectorizer: vectorizer.joblib
‚úì Model: baseline_logreg.joblib
‚úì Dataset: cyber_incidents_simulated.csv.gz

‚úÖ All required files found! Ready to proceed.


In [64]:
# Add src directory to path for imports
import sys
from pathlib import Path

project_root = Path.cwd().parent
src_path = project_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

print(f"‚úì Project root: {project_root}")
print(f"‚úì Source path added: {src_path}")

‚úì Project root: c:\Users\Dell\Desktop\bigatti\AlertSage1
‚úì Source path added: c:\Users\Dell\Desktop\bigatti\AlertSage1\src


## üß† Section 2: Load Pre-trained Models

AlertSage uses two main components:
1. **TF-IDF Vectorizer** - Converts text to numerical features (~5,000 features)
2. **Logistic Regression Classifier** - Predicts incident type from features

Both models are pre-trained on 100,000+ synthetic security incidents.

In [65]:
# Import AlertSage modules
from triage.preprocess import clean_description
import joblib

# Note: We'll load models directly to ensure compatibility
# The model.py uses enhanced_logreg which has different features
print("‚úì AlertSage modules imported successfully")

‚úì AlertSage modules imported successfully


In [66]:
# Load the pre-trained models directly
# Using baseline_logreg.joblib which matches the vectorizer
print("Loading models...")
vectorizer = joblib.load(project_root / "models" / "vectorizer.joblib")
classifier = joblib.load(project_root / "models" / "baseline_logreg.joblib")
print("‚úÖ Models loaded successfully!\n")

# Display model information
print("üìä Model Information:")
print(f"  - Vectorizer features: {len(vectorizer.get_feature_names_out())}")
print(f"  - Model type: {type(classifier).__name__}")
print(f"  - Number of classes: {len(classifier.classes_)}")
print(f"  - Model expects: {classifier.n_features_in_} features")

# Verify compatibility
vec_features = len(vectorizer.get_feature_names_out())
if vec_features == classifier.n_features_in_:
    print(f"\n‚úÖ Vectorizer and model are compatible ({vec_features} features)")

Loading models...
‚úÖ Models loaded successfully!

üìä Model Information:
  - Vectorizer features: 5000
  - Model type: OneVsRestClassifier
  - Number of classes: 10
  - Model expects: 5000 features

‚úÖ Vectorizer and model are compatible (5000 features)


In [67]:
# Show all incident types the model can classify
print("\nüéØ Incident Types (10 classes):\n")
for i, event_type in enumerate(classifier.classes_, 1):
    print(f"  {i:2d}. {event_type}")

print("\nüí° The model predicts which of these 10 categories best describes a security incident.")


üéØ Incident Types (10 classes):

   1. access_abuse
   2. benign_activity
   3. credential_compromise
   4. data_exfiltration
   5. insider_threat
   6. malware
   7. phishing
   8. policy_violation
   9. suspicious_network_activity
  10. web_attack

üí° The model predicts which of these 10 categories best describes a security incident.


## üîç Section 3: Your First Prediction

Let's analyze a single security incident step-by-step to understand how the system works.

In [68]:
# Sample incident description
sample_incident = """
User alice.w reported receiving an email from ceo@company-secure.com 
requesting immediate password reset. Email contained a link to 
http://company-login-verify.xyz/reset that appears suspicious.
""".strip()

print("üìß Sample Incident:")
print("-" * 70)
print(sample_incident)
print("-" * 70)

üìß Sample Incident:
----------------------------------------------------------------------
User alice.w reported receiving an email from ceo@company-secure.com 
requesting immediate password reset. Email contained a link to 
http://company-login-verify.xyz/reset that appears suspicious.
----------------------------------------------------------------------


In [69]:
# Step 1: Text Preprocessing
# The model requires text to be cleaned and normalized first

cleaned_text = clean_description(sample_incident)

print("üßπ Text Preprocessing:")
print(f"\nOriginal length: {len(sample_incident)} characters")
print(f"Cleaned length:  {len(cleaned_text)} characters")
print(f"\nCleaned text:\n{cleaned_text}")
print("\nüí° Preprocessing: lowercase, normalize URLs/IPs, remove punctuation, etc.")

üßπ Text Preprocessing:

Original length: 196 characters
Cleaned length:  158 characters

Cleaned text:
user alice w reported receiving an email from ceo company secure com requesting immediate password reset email contained a link to url that appears suspicious

üí° Preprocessing: lowercase, normalize URLs/IPs, remove punctuation, etc.


In [70]:
# Step 2: Make Prediction
# Transform the text and predict
X = vectorizer.transform([cleaned_text])
predicted_label = classifier.predict(X)[0]
probabilities = classifier.predict_proba(X)[0]

# Create probability dictionary
proba_dict = dict(zip(classifier.classes_, probabilities))
# Sort and get top 5
proba_dict = dict(sorted(proba_dict.items(), key=lambda x: x[1], reverse=True)[:5])

print("üéØ Prediction Result:")
print(f"\n  Predicted Type: {predicted_label}")
print(f"  Confidence: {max(proba_dict.values()):.1%}")

üéØ Prediction Result:

  Predicted Type: phishing
  Confidence: 50.6%


In [71]:
# Step 3: View Probability Distribution
print("\nüìä Top 5 Probabilities:\n")

probs_df = pd.DataFrame([
    {"Event Type": event, "Probability": prob, "Confidence": f"{prob:.1%}"}
    for event, prob in proba_dict.items()
])

print(probs_df.to_string(index=False))

print("\nüí° The model is confident this is a phishing attempt based on:")
print("   - Suspicious email sender")
print("   - Password reset request")
print("   - Fake domain link")


üìä Top 5 Probabilities:

                 Event Type  Probability Confidence
                   phishing     0.505994      50.6%
            benign_activity     0.220700      22.1%
               access_abuse     0.202106      20.2%
           policy_violation     0.025727       2.6%
suspicious_network_activity     0.015422       1.5%

üí° The model is confident this is a phishing attempt based on:
   - Suspicious email sender
   - Password reset request
   - Fake domain link


## üìö Section 4: Batch Analysis

Now let's analyze multiple incidents at once - this is how you'd use AlertSage in a real SOC environment.

In [72]:
# Load a small sample from the dataset
# Note: pandas automatically handles .gz compressed files
df = pd.read_csv(project_root / "data" / "cyber_incidents_simulated.csv.gz")

print(f"üìä Full dataset size: {len(df):,} incidents")
print(f"\nSampling 30 incidents for tutorial...")

# Take a diverse sample across different event types
sample_df = df.groupby('event_type', group_keys=False).apply(
    lambda x: x.sample(min(3, len(x)), random_state=42)
).reset_index(drop=True)

print(f"‚úì Sample size: {len(sample_df)} incidents")
print(f"‚úì Columns: {', '.join(sample_df.columns.tolist())}")

üìä Full dataset size: 500,000 incidents

Sampling 30 incidents for tutorial...
‚úì Sample size: 30 incidents
‚úì Columns: event_id, timestamp, log_source, event_type, severity, mitre_technique, mitre_clause, user, device, src_ip, dest_ip, src_country, dest_country, src_port, dest_port, protocol, detection_rule, is_true_positive, description, description_short, description_user_report, short_log






In [73]:
# Preview the sample data
print("üîç Sample Incidents Preview:\n")
print(sample_df[['event_id', 'event_type', 'severity', 'description']].head(5).to_string(index=False))

üîç Sample Incidents Preview:

 event_id      event_type severity                                                                                                                                                                                                                                                                                                                                                                                                     description
   171216    access_abuse   medium                                                                                         Bsaed on current evidence repeated account lockuots for gina.t associated with sign-in attempts frmo unrecognized locations. auth telemetry shwos unusual IPs including 158.173.238.165 and 59.154.26.63, which do not match historical baselines. This pattern aligns with MITRE ATT&CK technique T1110 (Burte Force).
   420689    access_abuse     info Preliminary analysis indicates repeated failed login attempts f

In [74]:
# Analyze all incidents in the sample
print("‚ö° Analyzing incidents...\n")

results = []
for idx, row in sample_df.iterrows():
    # Clean and transform
    cleaned = clean_description(row['description'])
    X = vectorizer.transform([cleaned])
    
    # Predict
    predicted = classifier.predict(X)[0]
    probs = classifier.predict_proba(X)[0]
    probs_dict = dict(zip(classifier.classes_, probs))
    
    results.append({
        'event_id': row['event_id'],
        'true_label': row['event_type'],
        'predicted': predicted,
        'confidence': max(probs),
        'top_3_probs': ', '.join([f"{k}:{v:.2f}" for k, v in sorted(probs_dict.items(), key=lambda x: x[1], reverse=True)[:3]])
    })

results_df = pd.DataFrame(results)
print(f"‚úÖ Analyzed {len(results_df)} incidents")
print(f"üìä Average confidence: {results_df['confidence'].mean():.1%}")

‚ö° Analyzing incidents...

‚úÖ Analyzed 30 incidents
üìä Average confidence: 67.7%


In [75]:
# Display results summary
print("\nüìã Results Preview:\n")
print(results_df.head(10).to_string(index=False))


üìã Results Preview:

 event_id            true_label             predicted  confidence                                                                   top_3_probs
   171216          access_abuse          access_abuse    0.712657                access_abuse:0.71, credential_compromise:0.20, web_attack:0.09
   420689          access_abuse          access_abuse    0.663651                access_abuse:0.66, credential_compromise:0.21, web_attack:0.13
   244999          access_abuse          access_abuse    0.650317                access_abuse:0.65, credential_compromise:0.23, web_attack:0.12
   252969       benign_activity       benign_activity    0.690559 benign_activity:0.69, policy_violation:0.21, suspicious_network_activity:0.09
   162165       benign_activity       benign_activity    0.702247 benign_activity:0.70, policy_violation:0.18, suspicious_network_activity:0.11
   175440       benign_activity       benign_activity    0.737168 benign_activity:0.74, suspicious_network_activ

In [76]:
# Calculate accuracy (since we have true labels)
correct = (results_df['true_label'] == results_df['predicted']).sum()
accuracy = correct / len(results_df)

print("\nüéØ Performance Metrics:")
print(f"  Correct predictions: {correct}/{len(results_df)}")
print(f"  Accuracy: {accuracy:.1%}")
print(f"  Average confidence: {results_df['confidence'].mean():.1%}")
print(f"  Confidence range: {results_df['confidence'].min():.1%} - {results_df['confidence'].max():.1%}")


üéØ Performance Metrics:
  Correct predictions: 29/30
  Accuracy: 96.7%
  Average confidence: 67.7%
  Confidence range: 52.3% - 78.1%


---

## ‚úÖ Phase 1 Complete!

**What we accomplished:**
- ‚úì Verified environment setup
- ‚úì Loaded pre-trained models
- ‚úì Made our first prediction
- ‚úì Analyzed 30 incidents in batch
- ‚úì Calculated basic performance metrics (96.7% accuracy!)

**Next:** Let's visualize these results with interactive charts!

## üìä Section 5: Visualizing Results

Data visualization helps us understand patterns and model behavior. Let's create 4 interactive charts using Plotly.

In [77]:
# Import visualization libraries
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

print("‚úì Plotly imported for interactive visualizations")

‚úì Plotly imported for interactive visualizations


### üìà Visualization 1: Class Distribution

Shows how predictions are distributed across incident types.

In [78]:
# Count predictions by event type
pred_counts = results_df['predicted'].value_counts().reset_index()
pred_counts.columns = ['Event Type', 'Count']

# Create bar chart
fig = px.bar(
    pred_counts,
    x='Event Type',
    y='Count',
    title='Predicted Incident Types Distribution',
    labels={'Event Type': 'Incident Type', 'Count': 'Number of Incidents'},
    color='Count',
    color_continuous_scale='Blues'
)

fig.update_layout(
    xaxis_tickangle=-45,
    height=500,
    showlegend=False
)

fig.show()

print(f"\nüìä Most common prediction: {pred_counts.iloc[0]['Event Type']} ({pred_counts.iloc[0]['Count']} incidents)")


üìä Most common prediction: policy_violation (4 incidents)


### üìä Visualization 2: Confidence Score Distribution

Histogram showing how confident the model is across all predictions.

In [79]:
# Create confidence histogram
fig = px.histogram(
    results_df,
    x='confidence',
    nbins=15,
    title='Confidence Score Distribution',
    labels={'confidence': 'Confidence Score', 'count': 'Number of Incidents'},
    color_discrete_sequence=['#667eea']
)

# Add uncertainty threshold line
fig.add_vline(
    x=0.50, 
    line_dash="dash", 
    line_color="red",
    annotation_text="Uncertainty Threshold (50%)",
    annotation_position="top"
)

fig.update_layout(
    xaxis_tickformat='.0%',
    height=500,
    showlegend=False
)

fig.show()

# Stats
below_threshold = (results_df['confidence'] < 0.50).sum()
above_threshold = (results_df['confidence'] >= 0.50).sum()

print(f"\nüìä Confidence Statistics:")
print(f"  Above threshold (‚â•50%): {above_threshold} incidents ({above_threshold/len(results_df):.1%})")
print(f"  Below threshold (<50%): {below_threshold} incidents ({below_threshold/len(results_df):.1%})")
print(f"\nüí° Low confidence cases would benefit from LLM second opinion.")


üìä Confidence Statistics:
  Above threshold (‚â•50%): 30 incidents (100.0%)
  Below threshold (<50%): 0 incidents (0.0%)

üí° Low confidence cases would benefit from LLM second opinion.


### üì¶ Visualization 3: Confidence by Event Type

Box plot showing confidence distribution for each incident type.

In [80]:
# Create box plot
fig = px.box(
    results_df,
    x='predicted',
    y='confidence',
    title='Confidence Scores by Event Type',
    labels={'predicted': 'Incident Type', 'confidence': 'Confidence Score'},
    color='predicted',
    points='all'  # Show all data points
)

fig.update_layout(
    xaxis_tickangle=-45,
    yaxis_tickformat='.0%',
    height=500,
    showlegend=False
)

fig.show()

# Find easiest and hardest to classify
avg_conf_by_type = results_df.groupby('predicted')['confidence'].mean().sort_values(ascending=False)

print(f"\nüìä Easiest to classify: {avg_conf_by_type.index[0]} ({avg_conf_by_type.iloc[0]:.1%} avg confidence)")
if len(avg_conf_by_type) > 1:
    print(f"üìä Hardest to classify: {avg_conf_by_type.index[-1]} ({avg_conf_by_type.iloc[-1]:.1%} avg confidence)")


üìä Easiest to classify: data_exfiltration (71.2% avg confidence)
üìä Hardest to classify: suspicious_network_activity (64.5% avg confidence)


### ‚úÖ Visualization 4: Confusion Matrix

Compare predictions vs actual labels to see where the model makes mistakes.

In [81]:
# Create confusion matrix
from sklearn.metrics import confusion_matrix
import numpy as np

# Get unique labels (in sorted order)
labels = sorted(results_df['true_label'].unique())

# Compute confusion matrix
cm = confusion_matrix(results_df['true_label'], results_df['predicted'], labels=labels)

# Create heatmap
fig = px.imshow(
    cm,
    labels=dict(x="Predicted", y="Actual", color="Count"),
    x=labels,
    y=labels,
    title='Confusion Matrix (Actual vs Predicted)',
    color_continuous_scale='Blues',
    text_auto=True
)

fig.update_layout(
    xaxis_tickangle=-45,
    height=600,
    width=700
)

fig.show()

# Find misclassifications
misclassified = results_df[results_df['true_label'] != results_df['predicted']]

if len(misclassified) > 0:
    print(f"\n‚ö†Ô∏è Misclassifications ({len(misclassified)}):")
    for idx, row in misclassified.iterrows():
        print(f"  ‚Ä¢ True: {row['true_label']} ‚Üí Predicted: {row['predicted']} (confidence: {row['confidence']:.1%})")
else:
    print("\nüéØ Perfect! No misclassifications in this sample.")


‚ö†Ô∏è Misclassifications (1):
  ‚Ä¢ True: data_exfiltration ‚Üí Predicted: policy_violation (confidence: 61.6%)


---

## ‚úÖ Phase 2 Complete!

**What we visualized:**
- ‚úì Class distribution across predictions
- ‚úì Confidence score histogram with uncertainty threshold
- ‚úì Confidence by event type (box plots)
- ‚úì Confusion matrix showing actual vs predicted

**Key Insights:**
- Most predictions are above 50% confidence threshold
- Some event types are easier to classify than others
- Confusion matrix shows where the model makes mistakes

**Next:** We'll explore uncertainty handling and LLM integration!

---

## üé≤ Section 6: Understanding Confidence & Uncertainty

One of AlertSage's key features is **uncertainty-aware classification**. When the model is unsure, it can flag cases for human review or LLM second opinion.

In [82]:
# Define uncertainty thresholds (from cli.py)
DIFFICULTY_MODES = {
    "default": {"threshold": 0.50, "max_classes": 5},
    "soc-medium": {"threshold": 0.60, "max_classes": 5},
    "soc-hard": {"threshold": 0.75, "max_classes": 3},
}

print("üéØ Uncertainty Threshold Modes:\n")
for mode, config in DIFFICULTY_MODES.items():
    print(f"  {mode:12s}: {config['threshold']:.0%} confidence required")

print("\nüí° Higher threshold = more cases flagged as 'uncertain' = more LLM consultations")

üéØ Uncertainty Threshold Modes:

  default     : 50% confidence required
  soc-medium  : 60% confidence required
  soc-hard    : 75% confidence required

üí° Higher threshold = more cases flagged as 'uncertain' = more LLM consultations


In [83]:
# Analyze uncertainty at different thresholds
def analyze_uncertainty(results_df, threshold):
    """Calculate how many cases would be flagged as uncertain."""
    uncertain = results_df[results_df['confidence'] < threshold]
    confident = results_df[results_df['confidence'] >= threshold]
    
    return {
        'threshold': threshold,
        'uncertain_count': len(uncertain),
        'confident_count': len(confident),
        'uncertain_pct': len(uncertain) / len(results_df) * 100,
        'avg_confidence_uncertain': uncertain['confidence'].mean() if len(uncertain) > 0 else 0,
        'avg_confidence_confident': confident['confidence'].mean() if len(confident) > 0 else 0
    }

# Test different thresholds
thresholds = [0.50, 0.60, 0.75]
analysis_results = []

for thresh in thresholds:
    result = analyze_uncertainty(results_df, thresh)
    analysis_results.append(result)
    
# Display results
print("üìä Uncertainty Analysis:\n")
print(f"{'Threshold':<12} {'Confident':<12} {'Uncertain':<12} {'% Uncertain':<15}")
print("-" * 55)

for res in analysis_results:
    print(f"{res['threshold']:.0%}         {res['confident_count']:<12} {res['uncertain_count']:<12} {res['uncertain_pct']:.1f}%")

print("\nüí° Trade-off: Higher threshold ‚Üí More LLM calls ‚Üí Better accuracy but slower")

üìä Uncertainty Analysis:

Threshold    Confident    Uncertain    % Uncertain    
-------------------------------------------------------
50%         30           0            0.0%
60%         26           4            13.3%
75%         4            26           86.7%

üí° Trade-off: Higher threshold ‚Üí More LLM calls ‚Üí Better accuracy but slower


In [84]:
# Examine the most uncertain case
most_uncertain = results_df.loc[results_df['confidence'].idxmin()]

print("üîç Most Uncertain Incident:\n")
print(f"Event ID: {most_uncertain['event_id']}")
print(f"True Label: {most_uncertain['true_label']}")
print(f"Predicted: {most_uncertain['predicted']}")
print(f"Confidence: {most_uncertain['confidence']:.1%}")
print(f"\nTop 3 Probabilities: {most_uncertain['top_3_probs']}")

# Get the full incident text
uncertain_incident = sample_df[sample_df['event_id'] == most_uncertain['event_id']].iloc[0]
print(f"\nIncident Description (first 200 chars):")
print(f"{uncertain_incident['description'][:200]}...")

print("\nüí° This incident would benefit from LLM second opinion or human review.")

üîç Most Uncertain Incident:

Event ID: 253726
True Label: suspicious_network_activity
Predicted: suspicious_network_activity
Confidence: 52.3%

Top 3 Probabilities: suspicious_network_activity:0.52, web_attack:0.33, benign_activity:0.14

Incident Description (first 200 chars):
Perliminary analysis indicates command-and-control-style traffic form WIN10-LAPTOP-01 (147.6.5.4205) to external infratsructure at 1188.4.105.112:3389 (CN). NetFlow analysis revelas DNS queries to sus...

üí° This incident would benefit from LLM second opinion or human review.


## ü§ñ Section 7: LLM Second Opinion (Optional)

AlertSage can consult a local LLM (Llama 3.1 8B) for uncertain cases. This provides:
- More detailed reasoning
- Context-aware classification
- Natural language explanations

**Note:** This requires downloading the Llama model (~5GB). Skip this section if the model is not available.

In [85]:
# Check if LLM model is available
llm_model_path = project_root / "models" / "Meta-Llama-3.1-8B-Instruct-Q6_K.gguf"
llm_available = llm_model_path.exists()

if llm_available:
    print("‚úÖ LLM model found!")
    print(f"   Path: {llm_model_path}")
    print("\nüí° You can use LLM second opinion for uncertain cases.")
else:
    print("‚ö†Ô∏è LLM model not found")
    print(f"   Expected: {llm_model_path}")
    print("\nüì• To download:")
    print("   python scripts/download_llama3_8b.py")
    print("\nüí° For now, we'll show what LLM integration looks like conceptually.")

‚ö†Ô∏è LLM model not found
   Expected: c:\Users\Dell\Desktop\bigatti\AlertSage1\models\Meta-Llama-3.1-8B-Instruct-Q6_K.gguf

üì• To download:
   python scripts/download_llama3_8b.py

üí° For now, we'll show what LLM integration looks like conceptually.


In [86]:
# Demonstrate the LLM workflow (conceptual)
print("üîÑ ML + LLM Workflow:\n")

workflow_steps = [
    "1. ML Model makes initial prediction",
    "2. Check confidence score",
    "3. If confidence < threshold:",
    "   ‚Üí Send to LLM with context",
    "   ‚Üí LLM analyzes incident",
    "   ‚Üí LLM provides reasoning",
    "   ‚Üí Use LLM's classification",
    "4. If confidence ‚â• threshold:",
    "   ‚Üí Trust ML prediction",
    "   ‚Üí Skip LLM (faster)"
]

for step in workflow_steps:
    print(f"  {step}")

print("\nüìä Performance Comparison (typical):")
print(f"  {'Method':<20} {'Speed':<15} {'Accuracy':<15} {'Cost'}")
print(f"  {'-'*60}")
print(f"  {'ML only':<20} {'100 inc/sec':<15} {'85-90%':<15} {'Low'}")
print(f"  {'ML + LLM (all)':<20} {'0.15 inc/sec':<15} {'90-95%':<15} {'High'}")
print(f"  {'Smart (selective)':<20} {'10-50 inc/sec':<15} {'88-93%':<15} {'Medium'}")

print("\nüí° Smart mode: Only use LLM for uncertain cases (best trade-off)")

üîÑ ML + LLM Workflow:

  1. ML Model makes initial prediction
  2. Check confidence score
  3. If confidence < threshold:
     ‚Üí Send to LLM with context
     ‚Üí LLM analyzes incident
     ‚Üí LLM provides reasoning
     ‚Üí Use LLM's classification
  4. If confidence ‚â• threshold:
     ‚Üí Trust ML prediction
     ‚Üí Skip LLM (faster)

üìä Performance Comparison (typical):
  Method               Speed           Accuracy        Cost
  ------------------------------------------------------------
  ML only              100 inc/sec     85-90%          Low
  ML + LLM (all)       0.15 inc/sec    90-95%          High
  Smart (selective)    10-50 inc/sec   88-93%          Medium

üí° Smart mode: Only use LLM for uncertain cases (best trade-off)


## üéì Section 8: Hands-On Exercises

Now it's your turn! Try these exercises to practice what you've learned.

### üéØ Exercise 1: Analyze Your Own Incident

Create and analyze a custom security incident of your choice.

**Instructions:**
1. Write an incident description (2-3 sentences)
2. Clean the text using `clean_description()`
3. Transform and predict
4. Examine the top 3 probabilities

**Hint:** Try incidents like:
- Ransomware attack
- Insider data theft
- Web application attack
- Normal business activity

In [87]:
# Exercise 1: Your code here
# TODO: Write your own incident description
my_incident = """
[Replace this with your incident description]
"""

# TODO: Clean, transform, and predict
# cleaned = clean_description(my_incident)
# X = vectorizer.transform([cleaned])
# predicted = classifier.predict(X)[0]
# probs = classifier.predict_proba(X)[0]

# TODO: Display results
# print(f"Predicted: {predicted}")
# print(f"Top 3: ...")

# ============================================================
# SOLUTION (uncomment to see example):
# ============================================================
# my_incident = "Employee downloaded encrypted file from personal Dropbox to company laptop during off-hours"
# cleaned = clean_description(my_incident)
# X = vectorizer.transform([cleaned])
# predicted = classifier.predict(X)[0]
# probs = classifier.predict_proba(X)[0]
# probs_dict = dict(zip(classifier.classes_, probs))
# top_3 = sorted(probs_dict.items(), key=lambda x: x[1], reverse=True)[:3]
# 
# print(f"Predicted: {predicted} ({max(probs):.1%} confidence)")
# print("\nTop 3:")
# for label, prob in top_3:
#     print(f"  {label}: {prob:.1%}")

### üéØ Exercise 2: Experiment with Thresholds

Change the uncertainty threshold to 70% and see how it affects your results.

**Instructions:**
1. Set threshold to 0.70
2. Count how many incidents fall below this threshold
3. Calculate what percentage would need LLM review
4. Compare to the default 50% threshold

In [88]:
# Exercise 2: Your code here
# TODO: Set new threshold and analyze

# ============================================================
# SOLUTION (uncomment to see):
# ============================================================
# new_threshold = 0.70
# below_threshold = results_df[results_df['confidence'] < new_threshold]
# above_threshold = results_df[results_df['confidence'] >= new_threshold]
# 
# print(f"Threshold: {new_threshold:.0%}")
# print(f"Uncertain cases: {len(below_threshold)} ({len(below_threshold)/len(results_df):.1%})")
# print(f"Confident cases: {len(above_threshold)} ({len(above_threshold)/len(results_df):.1%})")
# print(f"\nüí° With 70% threshold, {len(below_threshold)} incidents need LLM review")
# print(f"   vs {(results_df['confidence'] < 0.50).sum()} with 50% threshold")

### üéØ Exercise 3: Find Problematic Incidents

Identify the 5 incidents with the lowest confidence scores.

**Instructions:**
1. Sort results by confidence (ascending)
2. Get the bottom 5
3. Display their event IDs, predictions, and confidence scores
4. Bonus: Show which were misclassified

In [89]:
# Exercise 3: Your code here
# TODO: Find and display lowest confidence incidents

# ============================================================
# SOLUTION (uncomment to see):
# ============================================================
# lowest_5 = results_df.nsmallest(5, 'confidence')
# 
# print("üîç 5 Lowest Confidence Incidents:\n")
# print(f"{'Event ID':<12} {'True':<20} {'Predicted':<20} {'Confidence':<12} {'Correct?'}")
# print("-" * 80)
# 
# for idx, row in lowest_5.iterrows():
#     correct = "‚úì" if row['true_label'] == row['predicted'] else "‚úó"
#     print(f"{row['event_id']:<12} {row['true_label']:<20} {row['predicted']:<20} {row['confidence']:.1%}        {correct}")
# 
# misclassified = lowest_5[lowest_5['true_label'] != lowest_5['predicted']]
# print(f"\n‚ö†Ô∏è {len(misclassified)} of the 5 lowest confidence were misclassified")

## üöÄ Section 9: Next Steps

Congratulations! You've completed the AlertSage tutorial. Here's where to go next:

### üìö Explore More Notebooks

Continue your learning journey with the advanced notebooks:

1. **[01_explore_dataset.ipynb](01_explore_dataset.ipynb)** - Deep dive into the dataset
2. **[02_prepare_text_and_features.ipynb](02_prepare_text_and_features.ipynb)** - Feature engineering
3. **[03_baseline_model.ipynb](03_baseline_model.ipynb)** - Train your own model
4. **[04_model_interpretability.ipynb](04_model_interpretability.ipynb)** - Understand predictions
5. **[05_inference_and_cli.ipynb](05_inference_and_cli.ipynb)** - CLI integration
6. **[06_model_visualization_and_insights.ipynb](06_model_visualization_and_insights.ipynb)** - Advanced viz
7. **[07_scenario_based_evaluation.ipynb](07_scenario_based_evaluation.ipynb)** - Real-world testing
8. **[08_model_comparison.ipynb](08_model_comparison.ipynb)** - Compare algorithms
9. **[09_operational_decision_support.ipynb](09_operational_decision_support.ipynb)** - SOC workflows
10. **[10_hybrid_model.ipynb](10_hybrid_model.ipynb)** - ML + LLM integration
11. **[11_advanced_training_techniques.ipynb](11_advanced_training_techniques.ipynb)** - Advanced ML

### üõ†Ô∏è Try the Command-Line Interface

AlertSage has a powerful CLI tool:

```bash
# Analyze single incident
nlp-triage "User clicked suspicious email attachment"

# Batch analysis from CSV
nlp-triage --batch incidents.csv

# With LLM second opinion
nlp-triage --use-llm "Ambiguous security event"

# Change difficulty mode
nlp-triage --difficulty soc-hard "Potential threat"

# JSON output for automation
nlp-triage --json "Incident description" > result.json
```

**Documentation:** [docs/cli.md](../docs/cli.md)

### üé® Launch the Streamlit UI

Try the interactive web dashboard:

```bash
streamlit run ui_premium.py
```

**Features:**
- Real-time incident triage
- Bulk CSV upload and analysis
- Interactive visualizations
- LLM integration
- Historical analysis
- Export results

**Documentation:** [docs/ui-guide.md](../docs/ui-guide.md)

### üìñ Documentation & Resources

**Core Documentation:**
- [Getting Started Guide](../docs/getting-started.md) - Installation and setup
- [Architecture Overview](../docs/architecture.md) - System design
- [Model Information](../docs/model-information.md) - ML details
- [LLM Integration](../docs/llm-integration.md) - Using Llama models
- [MITRE ATT&CK Mapping](../docs/mitre-attribution.md) - Threat intelligence

**Contributing:**
- [Contributing Guide](../CONTRIBUTING.md) - How to contribute
- [Development Workflow](../docs/development.md) - Dev setup
- [Testing Guide](../docs/testing.md) - Run tests

**Project Links:**
- GitHub: [github.com/texasbe2trill/AlertSage](https://github.com/texasbe2trill/AlertSage)
- Docs Site: [texasbe2trill.github.io/AlertSage](https://texasbe2trill.github.io/AlertSage/)
- Issues: [Report bugs or request features](https://github.com/texasbe2trill/AlertSage/issues)

---

## üéâ Tutorial Complete!

**What you learned:**
- ‚úÖ How to load and use pre-trained models
- ‚úÖ Single incident analysis workflow
- ‚úÖ Batch processing for SOC environments
- ‚úÖ Creating interactive visualizations
- ‚úÖ Understanding confidence and uncertainty
- ‚úÖ When to use LLM second opinions
- ‚úÖ Hands-on practice with real scenarios

**Key Takeaways:**
1. **Preprocessing is critical** - Text must be cleaned consistently
2. **Confidence matters** - Not all predictions are equally trustworthy
3. **Hybrid approach works best** - ML for speed, LLM for accuracy on hard cases
4. **Visualization reveals insights** - Charts show patterns you might miss
5. **Uncertainty is valuable** - Knowing when you don't know is powerful

**Thank you for trying AlertSage!** üõ°Ô∏è

Questions? Ideas? Contributions?  
üëâ [Open an issue](https://github.com/texasbe2trill/AlertSage/issues) or submit a PR!

---

*Last updated: December 26, 2025*