# Phishing_ML — Model Evaluation Notebook

### Overview:
This notebook demonstrates the training and evaluation of machine learning models 
for phishing website detection using the `phishing_ml` package.

Two approaches are compared:
1. Decision Tree (baseline model decision_tree.py)
2. AutoML (automated model selection and optimization automl_model.py)

Workflow:
- Load and preprocess dataset
- Train and evaluate the Decision Tree model
- Run AutoML to automatically find the best-performing model
- Compare results and analyze model performance

In [None]:
import sys
import os

sys.path.append(os.path.abspath(".."))

import decision_tree
import automl_model

In [None]:
# # Train Decision tree model
dt_results = decision_tree.run_decision_tree("phishing.csv")
print("Test Accuracy:", dt_results["accuracy"])
print("Cross-validation Mean Accuracy:", dt_results["cv_mean"])     
print("Classification Report:\n", dt_results["classification_report"])

Test Accuracy: 0.8937132519222072
Cross-validation Mean Accuracy: 0.888919041157847
Classification Report:
               precision    recall  f1-score   support

Not phishing       0.90      0.85      0.87       956
    Phishing       0.89      0.93      0.91      1255

    accuracy                           0.89      2211
   macro avg       0.89      0.89      0.89      2211
weighted avg       0.89      0.89      0.89      2211



In [12]:
# Train AutoML models
aml_results = automl_model.run_automl("phishing.csv")

print("Test Accuracy:", aml_results["test_accuracy"])
print("Test AUC:", aml_results["test_auc"])
print("Best Model ID:", aml_results["best_model"].model_id)


Checking whether there is an H2O instance running at http://localhost:54321. connected.


0,1
H2O_cluster_uptime:,1 hour 53 mins
H2O_cluster_timezone:,Europe/Helsinki
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.46.0.8
H2O_cluster_version_age:,"21 days, 22 hours and 3 minutes"
H2O_cluster_name:,H2O_from_python_Käyttäjä_u7vvd8
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,3.841 Gb
H2O_cluster_total_cores:,8
H2O_cluster_allowed_cores:,8


Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
AutoML progress: |█
13:26:19.639: AutoML: XGBoost is not available; skipping it.

██████████████████████████████████████████████████████████████| (done) 100%
Test Accuracy: 0.9723884031293143
Test AUC: 0.9964556353681246
Best Model ID: StackedEnsemble_AllModels_1_AutoML_3_20251030_132619


## Results

### **Decision Tree Model**
**Performance Summary:**
- **Test Accuracy:** `0.8937`
- **Cross-Validation Mean Accuracy:** `0.8889`

**Classification Report:**
| Class         | Precision | Recall | F1-Score | Support |
|----------------|------------|---------|-----------|----------|
| Not phishing   | 0.90       | 0.85    | 0.87      | 956      |
| Phishing       | 0.89       | 0.93    | 0.91      | 1255     |
| **Overall**    | **0.89**   | **0.89**| **0.89**  | **2211** |

The Decision Tree achieved balanced performance across both classes and provides a good interpretable baseline.

---

### **AutoML Model**
**Performance Summary:**
- **Test Accuracy:** `0.9724`
- **Test AUC:** `0.9965`
- **Best Model:** `StackedEnsemble_AllModels_1_AutoML_3_20251030_132619`

> Note: XGBoost was not available during training and was skipped automatically.

The AutoML model (a stacked ensemble) significantly outperformed the Decision Tree, showing excellent generalization and near-perfect discriminative power.

---

## Conclusion
- The **Decision Tree** provides interpretability and consistency.  
- The **AutoML Stacked Ensemble** delivers superior performance, making it the best choice for deployment.  

## Possible Next Steps
- Deploy the best model as a prediction API or web service
- Add SHAP or feature importance analysis for explainability
- Extend dataset and evaluate real-world phishing URLs