Skip to content

yashwantzap/Capstone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTAM - Cyber Threat Analysis and Mitigation System

Overview

CTAM is an end-to-end intelligent cybersecurity platform that helps security teams identify, analyze, and respond to software vulnerabilities. It combines real government threat data, machine learning predictions, and AI-generated remediation plans.


How It Works

Step 1: Data Collection

  • Fetches real vulnerability data from CISA (Cybersecurity & Infrastructure Security Agency)
  • Source: The Known Exploited Vulnerabilities (KEV) catalog
  • URL: https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json
  • Contains ~1,500 actively exploited vulnerabilities with CVE IDs, vendors, products, and threat indicators

Step 2: ML Risk Prediction

  • Trains a Random Forest classifier on the collected data
  • Learns patterns from features like: exploit availability, ransomware use, time since disclosure
  • Predicts risk as Low, Medium, or High with confidence scores

Step 3: AI Mitigation Planning

  • Uses OpenAI GPT-4 to generate customized remediation strategies
  • Analyzes the specific vulnerability details to suggest actionable steps
  • Falls back to rule-based suggestions if no API key is available

Project Structure

CTAM/
├── client/                    # Frontend (React + TypeScript)
│   └── src/
│       ├── pages/
│       │   ├── Dashboard.tsx      # Main stats & charts
│       │   ├── Vulnerabilities.tsx # List & filter CVEs
│       │   ├── Analyze.tsx        # CVE & custom analysis
│       │   ├── Model.tsx          # ML metrics & training
│       │   ├── Alerts.tsx         # Alert management
│       │   └── AuditLogs.tsx      # Activity history
│       ├── components/            # Reusable UI components
│       └── lib/                   # API client & utilities
│
├── server/                    # Backend (Express + TypeScript)
│   ├── index.ts               # Server entry point
│   ├── routes.ts              # All API endpoints
│   ├── storage.ts             # File-based data persistence
│   └── lib/
│       ├── cisaFeed.ts        # CISA KEV data fetcher
│       ├── mlModel.ts         # Random Forest classifier
│       └── openai.ts          # AI mitigation generator
│
├── shared/                    # Shared between frontend & backend
│   └── schema.ts              # TypeScript types & validation
│
└── data/                      # Persistent JSON storage
    ├── vulnerabilities.json   # All collected CVE data
    ├── predictions.json       # Risk predictions from ML model
    ├── alerts.json            # High-risk alerts
    ├── auditLogs.json         # Activity history
    ├── modelMetrics.json      # ML model accuracy stats
    └── state.json             # Timestamps for last actions

Features

1. Dashboard

  • Real-time statistics (total vulnerabilities, risk counts, pending alerts)
  • Visual charts showing risk distribution
  • Quick action buttons to collect data and train model
  • Model accuracy display

2. Vulnerability Browser

  • Browse all 1,500+ vulnerabilities from CISA KEV
  • Filter by risk level (High, Medium, Low)
  • Search by CVE ID, vendor, or product
  • View detailed vulnerability information

3. CVE Analysis

  • Enter any CVE ID (e.g., CVE-2021-44228)
  • Get ML-powered risk prediction with confidence score
  • Receive AI-generated mitigation plan with actionable steps

4. Custom Vulnerability Analysis

  • Enter your own vulnerability details:
    • Name and description
    • Vendor and product
    • Days since discovery
    • Known exploit availability
    • Ransomware use indicator
  • Get instant risk prediction and mitigation recommendations

5. Alert System

  • Automatic alerts generated for high-risk vulnerabilities
  • Alert statuses: Pending, Acknowledged, Resolved
  • Manage alerts with workflow tracking

6. Audit Logging

  • Complete activity tracking for accountability
  • Logs all data collection, model training, and analysis events
  • Timestamped entries with action details

7. Persistent Storage

  • File-based JSON storage
  • Data survives server restarts
  • No database required

API Endpoints

Endpoint Method Description
/api/collectdata POST Fetch vulnerabilities from CISA KEV feed
/api/trainmodel POST Train the ML classifier on collected data
/api/analyzevulnerability POST Analyze a specific CVE by ID
/api/analyzecustomvulnerability POST Analyze custom vulnerability details
/api/vulnerabilities GET List all collected vulnerabilities
/api/predictions GET Get all risk predictions
/api/alerts GET Get all system alerts
/api/alerts/:id PATCH Update alert status (acknowledge/resolve)
/api/auditlogs GET Get audit log entries
/api/model/metrics GET Get ML model performance metrics
/api/dashboard/stats GET Get dashboard statistics

ML Model Details

Algorithm

Random Forest Classifier (from ml-random-forest library)

Features Used for Prediction

Feature Description Values
hasExploit Is there a known exploit? 0 or 1
daysSinceDisclosure Days since vulnerability disclosed Normalized 0-1
ransomwareUse Is it used in ransomware campaigns? 0 or 1
hasCwe Does it have a CWE classification? 0 or 1
vendorPopularity How common is the vendor? 0-1 score
actionUrgency How soon is action required? 0-1 score
tfidfFeatures TF-IDF text features (50 dimensions) L2 normalized

TF-IDF Text Analysis

The model uses Term Frequency - Inverse Document Frequency (TF-IDF) to extract features from vulnerability descriptions:

  1. Text Preprocessing: Tokenization, stemming (Porter Stemmer), lowercase conversion
  2. Vocabulary Building: Top 50 terms from corpus, prioritizing security keywords
  3. Security Keywords: remote, code, execution, buffer, overflow, sql, injection, etc.
  4. IDF Weights: Computed from training corpus and reused for all predictions
  5. L2 Normalization: Features are normalized for consistent magnitude

Output Classes

  • Low - Lower priority vulnerabilities
  • Medium - Moderate risk, should be addressed
  • High - Critical, requires immediate attention

Special Rules

  • If both hasExploit AND ransomwareUse are true, automatically classified as High risk with 92% confidence

Model Metrics Tracked

  • Accuracy
  • Precision (per class)
  • Recall (per class)
  • F1 Score (per class)
  • Confusion Matrix
  • Training samples count

Tech Stack

Layer Technology Purpose
Frontend Framework React 18 UI components
Language TypeScript Type safety
Styling TailwindCSS Utility-first CSS
UI Components shadcn/ui Pre-built components
Backend Framework Express REST API server
ML Library ml-random-forest Machine learning
AI Integration OpenAI SDK Mitigation generation
Charts Recharts Data visualization
Data Fetching TanStack Query API state management
Routing Wouter Client-side routing
Validation Zod Schema validation
Storage File-based JSON Data persistence

Environment Variables

Variable Required Purpose
SESSION_SECRET Yes Session encryption key
OPENAI_API_KEY Optional Enables AI-powered mitigation plans
RESEND_API_KEY Optional API key for sending high-risk alert emails
ALERT_EMAIL_FROM Optional From address used for alert emails (e.g. ctam@your-domain.com)
ALERT_EMAIL_TO Optional Comma-separated list of recipients for high-risk alerts

Running the Application

Locally (VS Code)

  1. Download and unzip the project
  2. Install dependencies: npm install
  3. Create .env file:
    SESSION_SECRET=your-random-secret-here
    OPENAI_API_KEY=your-openai-key-here  # Optional
    
  4. Start the application: npm run dev
  5. Open browser to http://localhost:5000

Future Improvements

Scheduled Data Collection

Use node-cron to automatically fetch CISA data daily:

import cron from 'node-cron';
cron.schedule('0 6 * * *', async () => {
  // Fetch CISA data at 6 AM daily
});

Export/Reports

Add PDF and CSV export functionality:

  • Use json2csv for CSV exports
  • Use pdfkit for PDF report generation

Email Notifications

Send email alerts for high-risk vulnerabilities:

  • Integrate SendGrid, Mailgun, or Resend
  • Trigger on high-risk predictions

Improve ML Model

  • Add more features (CVSS scores, attack vectors)
  • Try other algorithms (XGBoost, Neural Networks)
  • Tune hyperparameters (tree depth, number of estimators)
  • Add cross-validation for better metrics

Data Sources

CISA KEV Catalog


Security Considerations

  1. No hardcoded secrets - All sensitive data in environment variables
  2. Input validation - All API inputs validated with Zod schemas
  3. Audit logging - All actions tracked for accountability
  4. Optional AI - System works without OpenAI key (uses fallback rules)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages