ExeRay 🏥

X-ray Vision for Windows Executables

Detect malicious .exe files using machine learning. Extracts static features (entropy, imports, metadata) and combines ML with heuristic rules for fast, automated classification.

⚙️ Features

Hybrid detection (Random Forest/XGBoost + rule-based checks).
Real-time predictions with confidence scores.
Handles obfuscated/novel malware better than signature-based tools.

🔧 Tech Stack

Core Components:

Language: Python 3.8+
ML Frameworks: scikit-learn, XGBoost
PE Analysis: pefile (for parsing Windows executables)
Data Handling: pandas, numpy
Security: pyzipper (malware sample decryption)

Key Workflows:

Feature Extraction:
- Static analysis of .exe files (entropy, section headers, imports).
- Uses pefile to extract metadata and structural features.
Model Training:
- Hybrid RandomForest + XGBoost ensemble.
- Threshold calibration for precision/recall balance.
Prediction:
- Real-time classification with confidence scoring.

📁 Directory Structure

ExeShield_AI/
├── assets/                      # Repo Images
├── data/                        # Raw Samples  
│   ├── malware/                 # Malicious Executables  
│   └── benign/                  # Clean Executables
├── dependencies/                # Installation Dependencies
├── models/                      # Saved Models/Thresholds  
│   ├── malware_detector.joblib  
│   └── optimal_threshold.npy  
├── output/                      # Processed Data (CSV/features)
│   └── malware_dataset.csv
├── scripts/                     # Core Scripts  
│   ├── download_malware_samples.py  
│   ├── extract_features.py  
│   ├── train_model.py  
│   └── predict.py  
└── README.md

💻 Installation and Usage (Commands & Outputs)

1. Clone the repository:

git clone https://github.com/MohamedMostafa010/ExeRay.git
cd ExeRay

2. Install dependencies:

pip install -r dependencies/requirements.txt

3. Download Samples:

> python download_malware_samples.py
API Response Status: ok
Downloading .exe malware: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [06:07<00:00,  3.67s/it]

? Samples ready!
- Malware samples in: ../data/malware
- Benign samples in:  ../data/benign

4. Extract Features:

> python extract_features.py
Dataset saved to ../output/malware_dataset.csv

5. Train Model:

> python train_model.py
Training models:   0%|                                                                                                                                                 | 0/2 [00:00<?, ?it/s]
New best model: XGBoost with F1=0.953
Training models:  50%|████████████████████████████████████████████████████████████████████▌                                                                    | 1/2 [00:01<00:01,  1.19s/it]
New best model: RandomForest with F1=0.964
Training models: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.53s/it]

=== Final Evaluation ===
              precision    recall  f1-score   support

           0       0.92      0.96      0.94        24
           1       0.95      0.90      0.93        21

    accuracy                           0.93        45
   macro avg       0.94      0.93      0.93        45
weighted avg       0.93      0.93      0.93        45

ROC AUC Score: 0.951

Optimal threshold: 0.670

Model saved to ../models/malware_detector.joblib

6. Predict Executable:

> python predict.py "path/to/[benign_file]"
Malware Detection Results:
========================================
File: pestudio.exe
Prediction: BENIGN
Malware Probability: 66.98%
Confidence Level: HIGH
Decision Threshold: 67.05%

> python predict.py "path/to/[suspicious_file]"
Malware Detection Results:
========================================
File: e31b997d118cff687de394cd347248efb5fd0f1d2fa6ba6639c42505c28f4a59.exe
Prediction: MALWARE
Malware Probability: 91.60%
Confidence Level: VERY_HIGH
Decision Threshold: 67.05%

🔍 Handling False Positives

While ExeShield AI achieves high accuracy, occasional false positives (legitimate files flagged as malware) may occur. Common causes:
- Legitimate tools with behaviors resembling malware (e.g., putty.exe).
- Packed/obfuscated benign files (high entropy).

- Example False Positive Output:

> python predict.py "C:\Program Files\PuTTY\putty.exe"
Malware Detection Results:
========================================
File: putty.exe
Prediction: MALWARE
Malware Probability: 92.76%
Confidence Level: VERY_HIGH
Decision Threshold: 67.05%

Mitigation Strategies:

Adjust Threshold:
- Lower the decision threshold in predict.py for stricter filtering
Whitelist Trusted Files:
- Manually verify and exclude known-safe executables.
Retrain the Model:
- Add misclassified samples to your dataset and rerun train_model.py.

🤝 Contributing

Pull requests are welcome! If you have ideas for new user profiles, simulation modes, or forensic artifacts, feel free to contribute.

📖 License

This project is released under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ExeRay 🏥

⚙️ Features

🔧 Tech Stack

Core Components:

Key Workflows:

📁 Directory Structure

💻 Installation and Usage (Commands & Outputs)

1. Clone the repository:

2. Install dependencies:

3. Download Samples:

4. Extract Features:

5. Train Model:

6. Predict Executable:

🔍 Handling False Positives

Mitigation Strategies:

🤝 Contributing

📖 License

About

Uh oh!

Releases

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
data		data
dependencies		dependencies
models		models
output		output
scripts		scripts
LICENSE		LICENSE
README.md		README.md

License

MohamedMostafa010/ExeRay

Folders and files

Latest commit

History

Repository files navigation

ExeRay 🏥

⚙️ Features

🔧 Tech Stack

Core Components:

Key Workflows:

📁 Directory Structure

💻 Installation and Usage (Commands & Outputs)

1. Clone the repository:

2. Install dependencies:

3. Download Samples:

4. Extract Features:

5. Train Model:

6. Predict Executable:

🔍 Handling False Positives

Mitigation Strategies:

🤝 Contributing

📖 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Languages