#  Root Cause Analysis 

## Section 1: Data Loading & Preprocessing 

### Install Packages

First we will install machine learning and data ibraries required, to ensure are env supports data loading, ML model training, Deep learning workflows

- Pandas → for data loading and analysis

- TensorFlow → for deep learning model development

- OS → for file handling

In [None]:

import sys
!conda install --yes --prefix {sys.prefix} pandas tensorflow scikit-learn

[1;33mJupyter detected[0m[1;33m...[0m
[1;32m2[0m[1;32m channel Terms of Service accepted[0m
Retrieving notices: done
Channels:
 - defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/anaconda3/envs/ITops

  added / updated specs:
    - pandas
    - scikit-learn
    - tensorflow


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    absl-py-2.3.1              |  py313hca03da5_0         433 KB
    astunparse-1.6.3           |             py_0          17 KB
    c-ares-1.34.6              |       hfe05a68_0         173 KB
    expat-2.7.3                |       h50f4ffc_4          20 KB
    flatbuffers-24.3.25        |       h313beb8_0         1.3 MB
    freetype-2.14.1            |       h7cdc921_0         589 KB
    fribidi-1.0.16             |       h8859324_0          58 KB
    gast-0


### Loading the Dataset Into Pandas DataFrame

With this we have structured incident symptom data that the model will learn from.

In [None]:
import pandas as pd
import os
import tensorflow as tf

#Load the data file into a Pandas Dataframe
symptom_data = pd.read_csv("root_cause_analysis.csv")

#Explore the data loaded, print data type of each column
print(symptom_data.dtypes)


symptom_data.head() 
#print first 5 rows od dataset to preview 

ID              int64
CPU_LOAD        int64
MEMORY_LOAD     int64
DELAY           int64
ERROR_1000      int64
ERROR_1001      int64
ERROR_1002      int64
ERROR_1003      int64
ROOT_CAUSE     object
dtype: object


Unnamed: 0,ID,CPU_LOAD,MEMORY_LOAD,DELAY,ERROR_1000,ERROR_1001,ERROR_1002,ERROR_1003,ROOT_CAUSE
0,1,0,0,0,0,1,0,1,MEMORY
1,2,0,0,0,0,0,0,1,MEMORY
2,3,0,1,1,0,0,1,1,MEMORY
3,4,0,1,0,1,1,0,1,MEMORY
4,5,1,1,0,1,0,1,0,NETWORK_DELAY


### Pre Process Data 

Input data numbers now need to be converted to a form that can be consumed by ML algorithms

In [None]:
from sklearn import preprocessing

label_encoder = preprocessing.LabelEncoder() ## Used to assign number to text categories

symptom_data['ROOT_CAUSE'] = label_encoder.fit_transform(
                                symptom_data['ROOT_CAUSE'])

#Convert Pandas DataFrame to a numpy vector
np_symptom = symptom_data.to_numpy().astype(float)

#Extract the feature variables, means in all rows (:) extract column 1 to 7 which will be headings these are symptoms model will learn from
X_train = np_symptom[:,1:8]

#Extract the target variable (Y), convert to one-hot-encoding, means extract the last column in all rows that will be the root cause this is the predicted output
Y_train=np_symptom[:,8]

#One hot encoding: The model now treats them as separate classes, not numbers with order. Eg: CPU=1, Memory=0 does not mean CPU > MEMORY, 
# they are just labels, so we will have something like 0,1 1,0 for cpu and memory respectively  
Y_train = tf.keras.utils.to_categorical(Y_train,3)

print("Shape of feature variables :", X_train.shape)
print("Shape of target variable :",Y_train.shape)

#(1000, 7) means 1000 incidents with 7 symptoms each, and 
# (1000, 3) means 1000 root cause labels across 3 possible categories.

Shape of feature variables : (1000, 7)
Shape of target variable : (1000, 3)


### Building the Model with Keras

In [7]:
from tensorflow import keras
from tensorflow.keras import optimizers
from tensorflow.keras.regularizers import l2

#Setup Training Parameters
EPOCHS=20
BATCH_SIZE=100
VERBOSE=1
OUTPUT_CLASSES=len(label_encoder.classes_)
N_HIDDEN=128
VALIDATION_SPLIT=0.2

#Create a Keras sequential model
model = tf.keras.models.Sequential()
#Add a Dense Layer
model.add(keras.layers.Dense(N_HIDDEN,
                             input_shape=(7,),
                              name='Dense-Layer-1',
                              activation='relu'))

#Add a second dense layer
model.add(keras.layers.Dense(N_HIDDEN,
                              name='Dense-Layer-2',
                              activation='relu'))

#Add a softmax layer for categorial prediction
model.add(keras.layers.Dense(OUTPUT_CLASSES,
                             name='Final',
                             activation='softmax'))

#Compile the model, using Adam optimizer
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

#Build the model
model.fit(X_train,
          Y_train,
          batch_size=BATCH_SIZE,
          epochs=EPOCHS,
          verbose=VERBOSE,
          validation_split=VALIDATION_SPLIT)

model.summary()

Epoch 1/20


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.6612 - loss: 0.9868 - val_accuracy: 0.7900 - val_loss: 0.8963
Epoch 2/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7875 - loss: 0.7971 - val_accuracy: 0.8100 - val_loss: 0.7440
Epoch 3/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8000 - loss: 0.6439 - val_accuracy: 0.7900 - val_loss: 0.6356
Epoch 4/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8025 - loss: 0.5377 - val_accuracy: 0.7900 - val_loss: 0.5755
Epoch 5/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8100 - loss: 0.4827 - val_accuracy: 0.8000 - val_loss: 0.5506
Epoch 6/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8238 - loss: 0.4537 - val_accuracy: 0.8000 - val_loss: 0.5421
Epoch 7/20
[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 

### Understanding how we built Our Model

```
Import Libraries
from tensorflow import keras
from tensorflow.keras import optimizers
from tensorflow.keras.regularizers import l2
```


What this does:

Loads Keras, the library for building neural networks.

optimizers = controls how the model learns

l2 = used for regularization (to prevent overfitting — not used yet here). <br><br>


2️⃣ Training Settings (Hyperparameters)

``` 
EPOCHS=20, BATCH_SIZE=100, VERBOSE=1, OUTPUT_CLASSES=len(label_encoder.classes_), N_HIDDEN=128, VALIDATION_SPLIT=0.2 
```

Let’s decode each one: <br><br>

EPOCHS = 20

How many times the model will see the full dataset

1 epoch = model trains once on all data

20 = model learns 20 rounds

Analogy: Studying notes 20 times. <br><br>

BATCH_SIZE = 100

How many samples the model processes at a time

You have 1000 samples

Batch size = 100 

So → 10 batches per epoch

Why?

Smaller chunks = faster & more memory-efficient learning. <br><br>

VERBOSE = 1. 

Show training progress on screen

0 = silent

1 = show progress bar

2 = show one line per epoch. <br><br>

OUTPUT_CLASSES = len(label_encoder.classes_)

Meaning:

Count how many root cause categories exist

If root causes are:

CPU

MEMORY

NETWORK


Then:

OUTPUT_CLASSES = 3


This tells the model how many final outputs it needs. <br><br>

N_HIDDEN = 128

Number of neurons in hidden layers

More neurons = more learning capacity

128 = good medium size <br><br>

VALIDATION_SPLIT = 0.2

Use 20% of data to test model while training

If dataset = 1000 rows

→ 800 for training

→ 200 for validation

This helps detect overfitting. <br><br>

3️⃣ Create Neural Network Model

model = tf.keras.models.Sequential()

Meaning:

Build a layer-by-layer model (stacked like a pipeline) <br><br>

4️⃣ Add First Dense (Hidden) Layer

```
model.add(keras.layers.Dense(
    N_HIDDEN,
    input_shape=(7,),
    name='Dense-Layer-1',
    activation='relu'
))
```

<br><br>
Breakdown:

Dense

A fully connected layer

Every input connects to every neuron. <br><br>

N_HIDDEN = 128

This layer has 128 neurons

input_shape=(7,)

Because your input = 7 symptom features <br><br>

Example:

CPU_LOAD, MEMORY_LOAD, NETWORK_DELAY, ERROR_1... <br><br>

activation='relu' 

ReLU = Rectified Linear Unit

Turns negative numbers into zero

Keeps positive values

Helps model learn faster<br><br>

name='Dense-Layer-1'

Just a label for readability <br><br>

5️⃣ Add Second Hidden Layer
```
model.add(keras.layers.Dense(
    N_HIDDEN,
    name='Dense-Layer-2',
    activation='relu'
))
```

Meaning:

Another 128-neuron thinking layer

Why add more layers?

More layers = deeper learning

Learns more complex symptom patterns <br><br>

6️⃣ Add Final Output Layer (Softmax)

```
model.add(keras.layers.Dense(
    OUTPUT_CLASSES,
    name='Final',
    activation='softmax'
))
```

Key points: <br><br>

OUTPUT_CLASSES = 3 

Because there are 3 root causes

activation='softmax'

Softmax converts outputs into probabilities <br><br>

Example output:

CPU → 0.80

MEMORY → 0.15

NETWORK → 0.05


Model picks highest probability <br><br>

7️⃣ Compile the Model (Prepare for Learning)

```
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)
```

optimizer='adam'

Controls how weights update <br><br>

Adam = smart + fast optimizer

Think:

Adjusts learning like an intelligent teacher

loss='categorical_crossentropy'

Measures how wrong predictions are <br><br>

Used when:

✔ Multiple categories

✔ One-hot encoded labels

metrics=['accuracy']

Show prediction accuracy while training <br><br>

8️⃣ Train the Model

```
model.fit(
    X_train,
    Y_train,
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    verbose=VERBOSE,
    validation_split=VALIDATION_SPLIT
)
```

What happens here?

The model:

Reads symptoms (X_train)

Compares predictions to root cause (Y_train)

Adjusts neuron weights

Repeats 20 times

Validation means:

It tests performance on unseen 20% data <br><br>

9️⃣ Understanding Training Output

Example:

accuracy: 0.8625

val_accuracy: 0.8300

loss: 0.3716

Meaning:

Term	Meaning

Accuracy	How correct predictions are on training data

Val Accuracy	How correct on unseen validation data

Loss	How wrong predictions are (lower = better)

### The Fun Part: Running Our Model


In [20]:
#Pass individual flags to Predict the root cause
CPU_LOAD=1
MEMORY_LOAD=0
DELAY=0
ERROR_1000=0
ERROR_1001=1
ERROR_1002=1
ERROR_1003=0

import numpy as np

X = np.array([[CPU_LOAD, MEMORY_LOAD, DELAY,
               ERROR_1000, ERROR_1001, ERROR_1002, ERROR_1003]],
             dtype=np.float32)

prediction = np.argmax(model.predict(X), axis=1)
print(label_encoder.inverse_transform(prediction))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
['DATABASE_ISSUE']


### WHat Does Our Code Do And What This Output means:


#### Step 1: Define Incident Symptoms
```
CPU_LOAD=1
MEMORY_LOAD=0
DELAY=0
ERROR_1000=0
ERROR_1001=1
ERROR_1002=1
ERROR_1003=0
```
Meaning:

You’re describing what symptoms occurred:  
Symptom	Value	Meaning 
```
CPU_LOAD	  1	    CPU problem present
MEMORY_LOAD	0	    No memory issue
DELAY	      0	    No latency
ERROR_1001	1	    Error code occurred
ERROR_1002	1	    Error code occurred
```
This simulates a real IT incident.

#### Step 2: Import NumPy

```
import numpy as np
```

NumPy = used to create ML-friendly numeric arrays 

#### Step 3: Build Input Array for Model
```
X = np.array([[CPU_LOAD, MEMORY_LOAD, DELAY,
               ERROR_1000, ERROR_1001, ERROR_1002, ERROR_1003]],
             dtype=np.float32)
```
<br> <br>

What this does: 
Creates a single row of symptom data 

Example output array:
[[1, 0, 0, 0, 1, 1, 0]]


Why double brackets [[...]]?
Model expects batch format, even if only 1 incident

#### Step 4: Predict Root Cause
```
prediction = np.argmax(model.predict(X), axis=1)
```

What happens here?
model.predict(X)

Returns probabilities like:

[0.10, 0.15, 0.75]


Meaning:

CPU	MEMORY	DATABASE
10%  15%	   75%

np.argmax(...)
Chooses the highest probability index

Example:
[0.10, 0.15, 0.75] → index 2


So prediction becomes:  [2]

#### Step 5: Convert Number Back to Label
```
print(label_encoder.inverse_transform(prediction))
```

Why?
Your model predicts numbers like: 2

But humans want: DATABASE_ISSUE


So this maps number → real label

Based on symptoms, AI predicts the root cause is Database Issue


In [None]:
#Predicting as a Batch
# print(label_encoder.inverse_transform(
#         model.predict_classes([[1,0,0,0,1,1,0],
#                                 [0,1,1,1,0,0,0],
#                                 [1,1,0,1,1,0,1],
#                                 [0,0,0,0,0,1,0],
#                                 [1,0,1,0,1,1,1]])))

import numpy as np

X = np.array([
    [1, 0, 0, 0, 1, 1, 0],
    [0, 1, 1, 1, 0, 0, 0],
    [1, 1, 0, 1, 1, 0, 1],
    [0, 0, 0, 0, 0, 1, 0],
    [1, 0, 1, 0, 1, 1, 1]
], dtype=np.float32)

predictions = np.argmax(model.predict(X), axis=1)

print(label_encoder.inverse_transform(predictions))


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
['DATABASE_ISSUE' 'NETWORK_DELAY' 'MEMORY' 'DATABASE_ISSUE'
 'DATABASE_ISSUE']


### Prediction For Multiple Incidents!

#### Step 1: Define Multiple Incident Rows
```
X = np.array([
    [1, 0, 0, 0, 1, 1, 0],
    [0, 1, 1, 1, 0, 0, 0],
    [1, 1, 0, 1, 1, 0, 1],
    [0, 0, 0, 0, 0, 1, 0],
    [1, 0, 1, 0, 1, 1, 1]
], dtype=np.float32)
```

Meaning:
Each row represents one separate IT incident

#### Step 2: Predict All at Once
```
predictions = np.argmax(model.predict(X), axis=1)
```

Output example:
[2, 1, 0, 2, 2]

Meaning:
Model predicted a root cause for each row

#### Step 3: Convert Predictions to Labels
```
print(label_encoder.inverse_transform(predictions))
```

Output:
['DATABASE_ISSUE' 
 'NETWORK_DELAY' 
 'MEMORY' 
 'DATABASE_ISSUE'
 'DATABASE_ISSUE']

Meaning:
Each incident gets an automated RCA result
In the same order as input rows