# Dataset Preprocessing Script

## Overview
This script preprocesses the combined datasets for IoT devices (Archer, Camera, and Indoor) by encoding categorical variables. It also saves the processed datasets and their respective label encoders for future use, which is essential for real-time deployment.

## Libraries Used
- `pandas`: For data manipulation and analysis.
- `sklearn.preprocessing`: For encoding categorical variables.
- `joblib`: For saving the label encoders.

## File Paths
The paths for the combined datasets are defined as follows:
```python
file_paths = {
    'archer': r'C:\Users\USER\IoT_Network_Traffic_Management\data\processed\combined_archer.csv',
    'camera': r'C:\Users\USER\IoT_Network_Traffic_Management\data\processed\combined_camera.csv',
    'indoor': r'C:\Users\USER\IoT_Network_Traffic_Management\data\processed\combined_indoor.csv'
}


## Import necessary libraries

In [3]:
# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import os
import joblib  # To save the label encoders


## Define file paths for the processed datasets

In [4]:

# Define file paths for the processed datasets
file_paths = {
    'archer': r'C:\Users\USER\IoT_Network_Traffic_Management\data\processed\combined_archer.csv',
    'camera': r'C:\Users\USER\IoT_Network_Traffic_Management\data\processed\combined_camera.csv',
    'indoor': r'C:\Users\USER\IoT_Network_Traffic_Management\data\processed\combined_indoor.csv'
}


## Function to preprocess the dataset

In [5]:



# Function to preprocess the dataset
def preprocess_dataset(device_name):
    """
    Load the dataset for a specific device, preprocess it by encoding categorical variables,
    and save both the processed dataset and the label encoders for future use.

    Parameters:
    device_name (str): The name of the IoT device (e.g., 'archer', 'camera', 'indoor').
    """
    
    # Load the dataset
    df = pd.read_csv(file_paths[device_name])
    
    # Identify categorical columns
    categorical_cols = df.select_dtypes(include=['object']).columns.tolist()
    
    # Initialize a dictionary to store label encoders
    label_encoders = {}
    
    # Encode categorical variables
    for col in categorical_cols:
        le = LabelEncoder()
        df[col] = le.fit_transform(df[col])  # Transform the column
        label_encoders[col] = le  # Store the label encoder for future use
    
    # Save the processed dataset
    processed_path = f'C:\\Users\\USER\\IoT_Network_Traffic_Management\\data\\processed_data\\processed_{device_name}.csv'
    df.to_csv(processed_path, index=False)
    
    # Save label encoders
    encoder_path = f'C:\\Users\\USER\\IoT_Network_Traffic_Management\\data\\processed_data\\label\\label_encoder_{device_name}.pkl'
    joblib.dump(label_encoders, encoder_path)
    
    print(f"Processed dataset saved to {processed_path} and label encoders saved to {encoder_path}.")

# Preprocess datasets for Archer, Camera, and Indoor
for device in ['archer', 'camera', 'indoor']:
    preprocess_dataset(device)


Processed dataset saved to C:\Users\USER\IoT_Network_Traffic_Management\data\processed_data\processed_archer.csv and label encoders saved to C:\Users\USER\IoT_Network_Traffic_Management\data\processed_data\label\label_encoder_archer.pkl.
Processed dataset saved to C:\Users\USER\IoT_Network_Traffic_Management\data\processed_data\processed_camera.csv and label encoders saved to C:\Users\USER\IoT_Network_Traffic_Management\data\processed_data\label\label_encoder_camera.pkl.
Processed dataset saved to C:\Users\USER\IoT_Network_Traffic_Management\data\processed_data\processed_indoor.csv and label encoders saved to C:\Users\USER\IoT_Network_Traffic_Management\data\processed_data\label\label_encoder_indoor.pkl.
