Skip to content

vertexneuralforge/Machine-Learning-Based-Exploitability-Prediction-for-Penetration-Testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Machine-Learning-Driven-Exploit-Prediction-

Machine Learning-Based Exploitability Prediction for Penetration Testing A Data-Driven Approach to Prioritizing Vulnerabilities

IEEE TIFS Python 3.8+ License: MIT # ๐Ÿ“œ License
This project is licensed under the MIT License. See LICENSE for details.

๐Ÿ“Œ Overview This repository contains the code and data pipeline for the IEEE TIFS paper:

"Machine Learning-Based Exploitability Prediction for Penetration Testing: A Data-Driven Approach"

We present a production-ready XGBoost model that predicts the likelihood of a CVE being weaponized, using features from:

National Vulnerability Database (NVD)

Exploit Database (ExploitDB)

Key innovations: โœ… 25% recall at 6% precision (optimized for high-risk triage) โœ… 62.5% reduction in missed exploits vs. random sampling โœ… FastAPI microservice for integration with pentesting tools (Metasploit/Burp Suite)

๐Ÿš€ Quick Start

  1. Install Dependencies bash pip install -r requirements.txt # Python 3.8+
  2. Run the Jupyter Notebook bash jupyter notebook exploit_prediction.ipynb # Full pipeline: EDA โ†’ Training โ†’ Evaluation
  3. Deploy the FastAPI Service bash uvicorn api:app --reload # Access docs at http://localhost:8000/docs ๐Ÿ“‚ Repository Structure Copy โ”œโ”€โ”€ data/ # Processed datasets (NVD + ExploitDB) โ”‚ โ”œโ”€โ”€ nvd_2024.json # Sample NVD data โ”‚ โ””โ”€โ”€ exploits.csv # ExploitDB records โ”œโ”€โ”€ models/ # Pretrained XGBoost + SMOTE โ”‚ โ””โ”€โ”€ exploit_model.joblib โ”œโ”€โ”€ api/ # FastAPI deployment โ”‚ โ”œโ”€โ”€ app.py # REST endpoint โ”‚ โ””โ”€โ”€ schemas.py # Pydantic input validation โ”œโ”€โ”€ exploit_prediction.ipynb # Main Colab notebook โ”œโ”€โ”€ requirements.txt # Python dependencies โ””โ”€โ”€ LICENSE # MIT License ๐Ÿ” Key Features ๐Ÿ“Š Feature Engineering CVSS Metrics: Base score, attack vector, criticality flags

Temporal Signals: Days since publication ("golden hour" for exploits)

Class Imbalance Handling: SMOTE oversampling (1:738 ratio)

โš™๏ธ Optimized XGBoost Model python model = XGBClassifier( scale_pos_weight=100, # Penalize false negatives 100ร— more max_depth=10, n_estimators=200, eval_metric='logloss' ) ๐Ÿšจ Security Thresholding Recall-Optimized Decision Threshold (ฮธ=0.10):

25% exploit detection rate

<1% false alarms

๐ŸŒ API Endpoints Endpoint Description Example Request /predict Predict exploit probability {"cve_id": "CVE-2024-1234", "cvss_score": 9.8, "days_since_published": 30} /docs Interactive OpenAPI 3.0 docs - ๐Ÿ“ˆ Performance Comparison with Baselines (Test Set, n=5,979 CVEs):

Model Recall Precision F0.7-Score CVSS โ‰ฅ 7.0 8% 0.5% 0.03 Random Forest 7% 0.3% 0.02 Our XGBoost 25% 6% 0.18 SHAP Analysis: SHAP Summary Plot

๐Ÿ› ๏ธ Integration with Pentesting Tools python import requests

response = requests.post( "http://localhost:8000/predict", json={"cve_id": "CVE-2024-1234", "cvss_score": 9.2, "days_since_published": 15} ) print(response.json()) # {"risk_level": "HIGH", "probability": 0.87, "threshold_used": 0.10} ๐Ÿ“œ Citation If you use this work, please cite:

bibtex @article{your_tifs_paper, title={Machine Learning-Based Exploitability Prediction for Penetration Testing}, author={Your Name et al.}, journal={IEEE Transactions on Information Forensics and Security}, year={2024} } ๐Ÿ“ฎ Contact For questions or collaborations: ๐Ÿ“ง Email: your.email@example.com ๐Ÿ’ป GitHub Issues: Open an issue

๐Ÿšจ Disclaimer This tool is designed for defensive security only. Always comply with ethical hacking guidelines.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published