# Navigating the Future of Cybersecurity: Intrusion Detection Datasets
This notebook explores the critical role of intrusion detection datasets in modern cybersecurity, with a focus on the HIKARI-2021 dataset and practical implementations of intrusion detection systems (IDS).

## Setup and Required Libraries
Let's begin by importing the necessary libraries for our analysis:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest, chi2

# Set style for visualizations
plt.style.use('seaborn')
sns.set_palette('husl')

## Loading and Preprocessing Sample Network Traffic Data
We'll create a sample dataset similar to HIKARI-2021's structure to demonstrate IDS concepts:

In [None]:
# Create sample network traffic data
def generate_sample_data(n_samples=1000):
    np.random.seed(42)
    data = {
        'bytes_sent': np.random.randint(100, 10000, n_samples),
        'protocol_type': np.random.choice(['TCP', 'UDP', 'ICMP'], n_samples),
        'duration': np.random.uniform(0, 100, n_samples),
        'is_attack': np.random.choice([0, 1], n_samples, p=[0.8, 0.2])
    }
    return pd.DataFrame(data)

df = generate_sample_data()
print('Sample dataset shape:', df.shape)
df.head()

## Data Analysis and Visualization
Let's analyze the distribution of normal vs attack traffic:

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='is_attack')
plt.title('Distribution of Normal vs Attack Traffic')
plt.xlabel('Attack (1) vs Normal (0)')
plt.ylabel('Count')
plt.show()

## Building a Simple IDS Model
Now let's implement a basic neural network for intrusion detection:

In [None]:
# Prepare features
X = pd.get_dummies(df.drop('is_attack', axis=1))
y = df['is_attack']

# Split and scale data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Build model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

## Best Practices and Tips
1. Always normalize/scale features before training
2. Use appropriate validation techniques
3. Implement proper error handling
4. Monitor model performance metrics
5. Regularly update and retrain models with new data

## Conclusion
This notebook demonstrated key concepts in working with intrusion detection datasets and building basic IDS models. The field continues to evolve with new datasets like HIKARI-2021 providing better training data for more sophisticated detection systems.