# Network Intrusion Detection using Machine Learning
This notebook is for **Problem Statement No.40 – Network Intrusion Detection** under Electronics and Telecommunications Engineering using IBM Watsonx.ai Studio.

We will:
- Load the NSL-KDD dataset
- Preprocess it
- Train a classification model
- Evaluate it
- Deploy it using IBM Watson Machine Learning

In [2]:
# 📦 Import libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
import seaborn as sns
import matplotlib.pyplot as plt

## Load Dataset (uploaded in Watsonx project storage)

In [3]:
# Replace with actual project storage path
file_path = '/project_data/data_asset/KDDTrain+.txt'
columns_path = '/project_data/data_asset/Field Names.csv'

# Load column names
col_names = pd.read_csv(columns_path).columns.tolist()

# Load dataset
df = pd.read_csv(file_path, names=col_names)
df.head()

FileNotFoundError: [Errno 2] No such file or directory: '/project_data/data_asset/Field Names.csv'

##  Data Preprocessing

In [4]:
# Encode categorical features
label_enc = LabelEncoder()
for col in ['protocol_type', 'service', 'flag']:
    df[col] = label_enc.fit_transform(df[col])

# Map attack types to categories
def map_attack(label):
    if label in ['neptune', 'smurf', 'back', 'teardrop', 'pod']:
        return 'DoS'
    elif label in ['satan', 'ipsweep', 'nmap', 'portsweep']:
        return 'Probe'
    elif label in ['warezclient', 'guess_passwd', 'ftp_write', 'imap', 'phf']:
        return 'R2L'
    elif label in ['buffer_overflow', 'loadmodule', 'rootkit']:
        return 'U2R'
    else:
        return 'Normal'

df['attack_category'] = df['label'].apply(map_attack)
df.drop(['label'], axis=1, inplace=True)
df.head()

NameError: name 'df' is not defined

In [5]:
# Split features and target
X = df.drop('attack_category', axis=1)
y = df['attack_category']

# Scale features
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split train/test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

NameError: name 'df' is not defined

## Model Training & Evaluation

In [6]:
# Train model
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluation
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

NameError: name 'X_train' is not defined

## ☁️ Save Model for IBM Deployment

In [None]:
import joblib
joblib.dump(model, 'nids_model.pkl')
joblib.dump(scaler, 'scaler.pkl')