<a href="https://colab.research.google.com/github/themannnphil/intrusion-detection-system/blob/main/model_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import gradio as gr
import joblib
import pandas as pd

In [None]:
#!git clone https://github.com/themannnphil/intrusion-detection-system.git
%cd intrusion-detection-system

/content/intrusion-detection-system


In [None]:
%cd models

/content/intrusion-detection-system/models


 **Loading the model**

In [None]:
import joblib
model = joblib.load("xgb_nslkdd_pipeline.joblib")

**Checking feature names**

In [None]:
# Check expected feature names
print(model.feature_names_in_)
print(len(model.feature_names_in_))

['duration' 'protocol_type' 'service' 'flag' 'src_bytes' 'dst_bytes'
 'land' 'wrong_fragment' 'urgent' 'hot' 'num_failed_logins' 'logged_in'
 'num_compromised' 'root_shell' 'su_attempted' 'num_root'
 'num_file_creations' 'num_shells' 'num_access_files' 'num_outbound_cmds'
 'is_host_login' 'is_guest_login' 'count' 'srv_count' 'serror_rate'
 'srv_serror_rate' 'rerror_rate' 'srv_rerror_rate' 'same_srv_rate'
 'diff_srv_rate' 'srv_diff_host_rate' 'dst_host_count'
 'dst_host_srv_count' 'dst_host_same_srv_rate' 'dst_host_diff_srv_rate'
 'dst_host_same_src_port_rate' 'dst_host_srv_diff_host_rate'
 'dst_host_serror_rate' 'dst_host_srv_serror_rate' 'dst_host_rerror_rate'
 'dst_host_srv_rerror_rate']
41


**Model testing UI development using Gradio**

In [27]:
# Get expected features from the pipeline
expected_cols = list(model.feature_names_in_)

# Define a safe default sample (HTTP normal traffic)
sample_data = {col: 0 for col in expected_cols}
sample_data.update({
    "protocol_type": "tcp",
    "service": "http",
    "flag": "SF",
    "src_bytes": 232,
    "dst_bytes": 8153,
    "logged_in": 1,
    "count": 2,
    "srv_count": 2,
    "same_srv_rate": 1.0
})

def predict_attack(*values):
    row = dict(zip(expected_cols, values))
    df = pd.DataFrame([row])
    df = df[expected_cols]  # reorder columns

    try:
        pred = model.predict(df)[0]
        proba = model.predict_proba(df)[0][pred]  # confidence
        label = "🚨 Attack" if pred == 1 else "✅ Normal"
        return f"{label} (Confidence: {proba:.2f})"
    except Exception as e:
        return f"❌ Prediction error: {e}"

# --- Gradio UI ---
with gr.Blocks() as demo:
    gr.Markdown("## 🛡️ Intrusion Detection System (NSL-KDD + XGBoost)")
    gr.Markdown("Fill in features or use sample traffic examples below.")

    # Create input fields for each feature
    inputs = []
    for col in expected_cols:
        inputs.append(gr.Textbox(label=col, value=str(sample_data[col])))

    btn = gr.Button("🔍 Predict")
    output = gr.Label()

    btn.click(
        fn=predict_attack,
        inputs=inputs,
        outputs=output
    )

    # Predefined real-world traffic examples
    # --- Gradio UI ---
with gr.Blocks() as demo:
    gr.Markdown("## 🛡️ Intrusion Detection System (NSL-KDD + XGBoost)")
    gr.Markdown("Fill in features or click one of the realistic examples below.")

    inputs = []
    for col in expected_cols:
        inputs.append(gr.Textbox(label=col, value=str(sample_data[col])))

    btn = gr.Button("🔍 Predict")
    output = gr.Label()

    btn.click(
        fn=predict_attack,
        inputs=inputs,
        outputs=output
    )

    # 🔹 10 Realistic Example Cases (Normal + Attacks)
    example_data = [
        # ✅ Normal web browsing
        {"protocol_type": "tcp", "service": "http", "flag": "SF", "src_bytes": 300, "dst_bytes": 8000, "count": 2, "srv_count": 2, "same_srv_rate": 1.0, "logged_in": 1},

        # ✅ Normal DNS query
        {"protocol_type": "udp", "service": "domain_u", "flag": "SF", "src_bytes": 50, "dst_bytes": 200, "count": 1, "srv_count": 1, "same_srv_rate": 1.0},

        # 🚨 DoS: Smurf (ICMP flood)
        {"protocol_type": "icmp", "service": "ecr_i", "flag": "SF", "src_bytes": 0, "dst_bytes": 100000, "count": 200, "srv_count": 200, "same_srv_rate": 0.9},

        # 🚨 DoS: Neptune (SYN flood)
        {"protocol_type": "tcp", "service": "http", "flag": "S0", "src_bytes": 0, "dst_bytes": 0, "count": 300, "srv_count": 300, "same_srv_rate": 0.1},

        # 🚨 Probe: Port Scan
        {"protocol_type": "tcp", "service": "ftp", "flag": "REJ", "src_bytes": 20, "dst_bytes": 0, "count": 50, "srv_count": 50, "same_srv_rate": 0.2},

        # 🚨 Probe: Nmap Scan
        {"protocol_type": "tcp", "service": "telnet", "flag": "REJ", "src_bytes": 10, "dst_bytes": 0, "count": 60, "srv_count": 60, "same_srv_rate": 0.1},

        # 🚨 DoS: Teardrop
        {"protocol_type": "udp", "service": "http", "flag": "SF", "src_bytes": 0, "dst_bytes": 0, "count": 150, "srv_count": 150, "same_srv_rate": 0.05},

        # ✅ Normal FTP session
        {"protocol_type": "tcp", "service": "ftp", "flag": "SF", "src_bytes": 1500, "dst_bytes": 3000, "count": 2, "srv_count": 2, "same_srv_rate": 1.0, "logged_in": 1},

        # ⚠️ R2L: FTP brute force (may misclassify as normal)
        {"protocol_type": "tcp", "service": "ftp", "flag": "SF", "src_bytes": 20, "dst_bytes": 30, "count": 50, "srv_count": 50, "same_srv_rate": 0.95, "logged_in": 0},

        # ⚠️ U2R: Buffer overflow (may misclassify as normal)
        {"protocol_type": "tcp", "service": "telnet", "flag": "SF", "src_bytes": 2000, "dst_bytes": 50, "count": 1, "srv_count": 1, "same_srv_rate": 1.0, "num_shells": 1},
    ]

    # Convert dict → list of strings in correct order
    examples = [[str(row.get(col, 0)) for col in expected_cols] for row in example_data]

    gr.Examples(
        examples=examples,
        inputs=inputs,
        label="🔹 Example Scenarios"
    )

demo.launch()


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://afa0ca9ed888b5e9c6.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




#🛡️ **Intrusion Detection System (NSL-KDD + XGBoost)**
## **Project Overview**

* Developed a machine learning pipeline using XGBoost trained on the NSL-KDD dataset.

* Applied preprocessing (Imputer, StandardScaler, OneHotEncoder) inside a Scikit-learn Pipeline.

* Used SMOTE only on training data to balance classes, leaving test data untouched.

* Performed hyperparameter tuning with RandomizedSearchCV on GPU (Colab T4).

* Final model exported as xgb_nslkdd_pipeline.joblib and integrated into a Gradio web app.



## **Demo Setup**

- Built a Gradio web interface where users can input network traffic features or click pre-filled examples.

- Added 10 realistic scenarios:

 * ✅ Normal traffic (HTTP browsing, DNS query, FTP session)

 * 🚨 DoS attacks (Smurf, Neptune, Teardrop)

 * 🚨 Probe attacks (Port scanning, Nmap scan)

 * ⚠️ R2L (FTP brute force)

 * ⚠️ U2R (Buffer overflow)

* Predictions displayed as:

 * ✅ Normal

 * 🚨 Attack

* Also showed confidence levels (probabilities) from the model.

## **Results: Model Predictions vs Expected**
| Scenario                         | Expected     | Model Prediction | Confidence |
|----------------------------------|-------------|------------------|------------|
| Normal web browsing              | Normal      | ✅ Normal        | 1.00 |
| Normal DNS query                 | Normal      | ✅ Normal        | 1.00 |
| DoS: Smurf (ICMP flood)          | Attack      | 🚨 Attack        | 0.98 |
| DoS: Neptune (SYN flood)         | Attack      | 🚨 Attack        | 0.55 |
| Probe: Port Scan                 | Attack      | 🚨 Attack        | 0.97 |
| Probe: Nmap Scan                 | Attack      | 🚨 Attack        | 0.99 |
| DoS: Teardrop                    | Attack      | ✅ Normal        | 0.68 |
| Normal FTP session               | Normal      | ✅ Normal        | 1.00 |
| R2L: FTP brute force             | Attack      | ✅ Normal        | 0.57 |
| U2R: Buffer overflow             | Attack      | ✅ Normal        | 0.98 |


## **Analysis**

**Strengths:**

- The model correctly identified clear DoS and Probe attacks (Smurf, Port Scan, Nmap) with very high confidence (≥0.97).

- Normal sessions (web, DNS, FTP) were consistently classified as Normal with perfect confidence (1.0).

**Weaknesses:**

- DoS Teardrop was misclassified as Normal (0.68) — likely because fragmented packet attacks are less represented in the dataset.

- R2L (FTP brute force) and U2R (buffer overflow) were misclassified as Normal — this is expected because these are low-frequency classes in NSL-KDD, and our model was not strong at capturing them.

**Overall takeaway:**

- The system is very effective for detecting DoS and Probe attacks (which are the majority in real-world scenarios).

- It struggles with R2L/U2R due to data imbalance — a known limitation of many NSL-KDD models.


## **Conclusion:**
The IDS works well as a proof-of-concept demo. It successfully detects major attack categories (DoS, Probe) with high confidence while handling normal traffic correctly.
Future improvements would involve special handling for R2L/U2R classes (e.g., anomaly detection, cost-sensitive learning, or ensemble models).