# Anomaly Detections in Power Grid
## Project Overview
Author: Fatih E. NAR<br>
This project aims to deliver an anomaly detection approach for power grid<br>
Dataset: https://archive.ics.uci.edu/dataset/235/individual+household+electric+power+consumption

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, Dense, Flatten
from tensorflow.keras.models import load_model
import matplotlib.pyplot as plt
import os

# Check for available devices
if tf.config.list_physical_devices('GPU'):
    device = '/GPU:0'
    print("CUDA GPU available. Using GPU for training.")
elif tf.config.list_physical_devices('MPS'):
    device = '/MPS:0'
    print("MPS device available. Using MPS for training.")
else:
    device = '/CPU:0'
    print("No GPU or MPS device available. Using CPU for training.")

# Load the enriched dataset with anomalies from local file
csv_file_path = 'data/household_power_consumption_with_anomalies.txt'
data = pd.read_csv(csv_file_path, sep=';', index_col=0, parse_dates=True, infer_datetime_format=True)

# Ensure data is correctly loaded and has no missing values
print("First few rows of data:")
print(data.head())

# Check for NaN or infinite values in the data
print("Checking for NaN values before cleaning:", data.isna().sum().sum())
print("Checking for infinite values before cleaning:", np.isinf(data).sum().sum())

# Drop any remaining NaN values
data.dropna(inplace=True)

# Check again for NaN or infinite values in the data
print("Checking for NaN values after cleaning:", data.isna().sum().sum())
print("Checking for infinite values after cleaning:", np.isinf(data).sum().sum())

# Normalize the data
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)

# Check the normalized data for NaN or infinite values
print("Checking for NaN values after normalization:", np.isnan(scaled_data).sum())
print("Checking for infinite values after normalization:", np.isinf(scaled_data).sum())

# Prepare the data for CNN
def create_dataset(data, time_steps=1):
    X, y = [], []
    for i in range(len(data) - time_steps):
        X.append(data[i:(i + time_steps), :])
        y.append(data[i + time_steps, :])
    return np.array(X), np.array(y)

time_steps = 90  # Using 30 days of data to predict the next day
X, y = create_dataset(scaled_data, time_steps)

# Reshape input to be [samples, time steps, features]
X = X.reshape((X.shape[0], time_steps, X.shape[2]))

print("Shapes of X and y:", X.shape, y.shape)
print("First few samples of X:", X[:1])
print("First few samples of y:", y[:1])

In [None]:
# Define the CNN model
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=(X.shape[1], X.shape[2])))
model.add(Flatten())
model.add(Dense(50, activation='relu'))
model.add(Dense(X.shape[2]))
model.compile(optimizer='adam', loss='mse')

# Train the model within the appropriate device context
with tf.device(device):
    history = model.fit(X, y, epochs=100, batch_size=32, validation_split=0.2, verbose=1)

# Plot training history
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='validation')
plt.xlabel('Epochs')
plt.ylabel('Mean Squared Error')
plt.legend()
plt.title('Training and Validation Loss')
plt.show()

# Create the model directory if it doesn't exist
os.makedirs('models', exist_ok=True)

# Save the trained model
model.save('models/cnn_anomaly_detector.h5')

In [None]:
# Load the trained model
model = load_model('models/cnn_anomaly_detector.h5')

# Make predictions
predictions = model.predict(X)

# Calculate the mean squared error between the predictions and the actual values
mse = np.mean(np.power(y - predictions, 2), axis=1)

# Set a threshold for anomalies based on the MSE distribution plot
threshold = np.percentile(mse, 95)  # Adjust the percentile if necessary

# Identify anomalies
anomalies = mse > threshold

# Print the results
print("Number of anomalies detected:", np.sum(anomalies))
print("Anomalies indices:", np.where(anomalies))

# Create a DataFrame to store the results
results_df = pd.DataFrame(data={
    'datetime': data.index[time_steps:],
    'actual': scaler.inverse_transform(y)[:, 0],  # Inverse transform to get the original values
    'predicted': scaler.inverse_transform(predictions)[:, 0],  # Inverse transform to get the original values
    'mse': mse,
    'anomaly': anomalies
})

# Plot actual vs predicted values with anomalies highlighted
plt.figure(figsize=(15, 6))

# Plot actual values
plt.plot(results_df['datetime'], results_df['actual'], label='Actual', color='blue')

# Plot predicted values
plt.plot(results_df['datetime'], results_df['predicted'], label='Predicted', color='green')

# Highlight anomalies
anomalies = results_df[results_df['anomaly']]
plt.scatter(anomalies['datetime'], anomalies['actual'], color='red', label='Anomalies', s=50)

# Add labels and title
plt.xlabel('Date')
plt.ylabel('Power Consumption')
plt.title('Anomaly Detection in Power Consumption')
plt.legend()
plt.show()

# Save the results to a CSV file
results_df.to_csv('data/anomaly_detection_results.csv', index=False)
print("Anomaly detection results saved to anomaly_detection_results.csv")