# Model As a Service with RCA after anomaly detection using RAG<br>
## Project Overview<br>
Author:  Sedat Kaymaz & Fatih E. NAR <br>
Update: Adding support of Model as a Service BackEnd. Ref: https://maas.apps.prod.rhoai.rh-aiservices-bu.com/<br> 
This project aims to provide a root cause analsys method by using LLM with Dynamic RAG based on a system log file <br>
for reference after applying anomaly detection ML method to the basic telecom metric list.   <br>
Please NOTE: As MaaS does not offer proper embedding generation service (yet, our efforts ended with 500 server internal error so far) the correlation between metric anomaly and log datameshing can be poor. <br>

In [1]:
# Run once only
#%pip install -r requirements.txt

In [2]:
import os,sys
import pandas as pd
import numpy as np
import faiss
import requests
from sklearn.ensemble import IsolationForest
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import CSVLoader, TextLoader
from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema import StrOutputParser
from langchain.text_splitter import CharacterTextSplitter

# Load environment variables
load_dotenv()

False

In [None]:
def get_llm():
    """Retrieve the Language Model (LLM) based on the user's choice of OpenAI or server.

    Args:

        model_source (str): 'openai' for OpenAI API, 'server' for MaaS Backend.
        model_name (str): Model name, defaults to 'gpt-4' for OpenAI.

    Returns:

        ChatOpenAI or dict: LLM object for OpenAI or dictionary with server configuration.

    """
    #If you are having issues with api key entry via embedded input, you can uncomment the line below and replace 'put_your_key_here' with your actual key
    #os.environ["API_KEY"] = 'put_your_key_here'
    maas_api_key = os.getenv("API_KEY")
    if not maas_api_key:
        maas_api_key = input("Please enter your MaaS API key: ").strip()
        os.environ["API_KEY"] = maas_api_key
        print("MaaS API key has been set.")
    return {"server_url": maas_server_url, "api_key": maas_api_key}  # Dictionary format for server configuration

MAAS_MAX_CONTEXT_LENGTH = 4000
maas_server_url = 'https://mistral-7b-instruct-v0-3-maas-apicast-production.apps.prod.rhoai.rh-aiservices-bu.com:443/v1/chat/completions'
llm = get_llm()
API_KEY = os.getenv('API_KEY')


In [4]:

# Process metrics file with minimal MaaS embedding update
# Only applies to the embedding part to avoid OpenAI dependency for MaaS Backend

class CustomEmbeddings:
    """Custom embedding class for FAISS compatibility, with `embed_documents` and `embed_query` methods."""
    def embed_documents(self, texts):
        # Example embeddings; replace with actual embeddings from the MaaS backend if available
        return [np.random.rand(768) for _ in texts]

    def embed_query(self, text):
        return np.random.rand(768)  # Example embedding for a single query; replace if available


def process_metrics_file(filename):
    """
    Process the metrics file and return a vector store with embeddings based on model source.

    Args:
        filename (str): The name of the metrics file.
        model_source (str): Model source, 'openai' for OpenAI API or 'server' for MaaS Backend.

    Returns:
        vectorstore: A vector store containing the embeddings of the text documents.
    """
    loader = CSVLoader(f"data/{filename}")
    documents = loader.load()
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)

    embeddings = CustomEmbeddings()  # Use CustomEmbeddings for compatibility with FAISS
    vectorstore = FAISS.from_documents(texts, embeddings)
    return vectorstore
    
# Process metrics file
metrics_vectorstore = process_metrics_file("metrics.csv")

# Load metrics data for anomaly detection
df = pd.read_csv("data/metrics.csv")
df['time'] = pd.to_datetime(df['time'])
print(df.head(5))

`embedding_function` is expected to be an Embeddings object, support for passing in a function will soon be removed.


                 time  call_attempt  call_success  call_failure  \
0 2024-09-04 00:00:00           114           110             0   
1 2024-09-04 00:01:00           113           110             0   
2 2024-09-04 00:02:00           114           111             0   
3 2024-09-04 00:03:00           113           111             1   
4 2024-09-04 00:04:00           112           111             1   

   total_registered_subs  call_success_rate  
0                   9031              96.40  
1                   9084              97.34  
2                   9089              97.36  
3                   9035              98.23  
4                   9092              99.10  


In [5]:
# Anomaly detection using Isolation Forest
def detect_anomalies(df):
    """
    Detects anomalies in the given DataFrame using the Isolation Forest algorithm.

    Parameters:
    - df (pandas.DataFrame): The input DataFrame containing the data to be analyzed.

    Returns:
    - pandas.DataFrame: A subset of the input DataFrame containing only the rows that are classified as anomalies.
    """

    features = ['call_attempt', 'call_success', 'call_failure', 'total_registered_subs', 'call_success_rate']
    X = df[features]
    
    iso_forest = IsolationForest(contamination=0.005, random_state=42)
    anomalies = iso_forest.fit_predict(X)
    
    df['is_anomaly'] = anomalies
    return df[df['is_anomaly'] == -1]

# Detect anomalies
anomalies = detect_anomalies(df)

if anomalies.empty:
    print("No anomalies detected in the metrics.")

print(f"Anomalies found:\n{anomalies}\n")

Anomalies found:
                   time  call_attempt  call_success  call_failure  \
683 2024-09-04 11:23:00           114            27             0   
684 2024-09-04 11:24:00           113            32             0   
685 2024-09-04 11:25:00           112            40             0   
686 2024-09-04 11:26:00           114            70             2   

     total_registered_subs  call_success_rate  is_anomaly  
683                   9029             23.491          -1  
684                   9038             28.368          -1  
685                   9033             35.730          -1  
686                   9157             61.491          -1  



In [6]:
# RAG for processing log file
def process_log_file(filename):
    """
    Process a log file and return a vector store.

    Args:
        filename (str): The name of the log file to process.

    Returns:
        vectorstore: A vector store containing embeddings of the log file texts.
    """
    loader = TextLoader(f"data/{filename}")
    documents = loader.load()
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)
    embeddings = CustomEmbeddings()  # Use CustomEmbeddings for compatibility with FAISS
    vectorstore = FAISS.from_documents(texts, embeddings)
    return vectorstore

# Process log file
print("Processing log file...")
logs_vectorstore = process_log_file("systemd.log")
print("Log file has been processed.")   

`embedding_function` is expected to be an Embeddings object, support for passing in a function will soon be removed.


Processing log file...
Log file has been processed.


In [7]:
def analyze_root_cause(llm, metrics_vectorstore, logs_vectorstore, anomalies):
    # Prepare the anomalies list
    anomalies_list = anomalies.astype(str).values.tolist() if isinstance(anomalies, pd.DataFrame) else anomalies

    # Retrieve related metrics and logs information
    metrics_embedding = metrics_vectorstore.embedding_function.embed_query(str(anomalies_list))
    logs_embedding = logs_vectorstore.embedding_function.embed_query(str(anomalies_list))

    metrics_info = [doc.page_content[:500] for doc in metrics_vectorstore.similarity_search_by_vector(metrics_embedding, k=2)]  # Truncate to limit token usage
    logs_info = [doc.page_content[:500] for doc in logs_vectorstore.similarity_search_by_vector(logs_embedding, k=2)]  # Truncate to limit token usage

    # Prepare headers and payload for the MaaS server
    headers = {
        'accept': 'application/json',
        'Content-Type': 'application/json',
        'Authorization': API_KEY,
    }

    # Prepare content with token count calculation
    messages_content = f"Anomalies: {anomalies_list}\nMetrics Info: {metrics_info}\nLogs Info: {logs_info}"
    message_token_count = len(messages_content.split())  # Rough token approximation based on word count
    max_tokens = max(0, 6144 - message_token_count - 550)  # Buffer for completion tokens

    data = {
        "messages": [
            {
                "content": "You are a root cause analysis assistant. Provide a detailed analysis of anomalies using the provided metrics and logs.",
                "role": "system",
                "name": "system"
            },
            {
                "content": messages_content,
                "role": "user",
                "name": "user"
            }
        ],
        "model": "mistral-7b-instruct",
        "max_tokens": max_tokens,
        "temperature": 0.7,
        "top_p": 1,
        "n": 1,
        "stream": False,
        "presence_penalty": 0,
        "frequency_penalty": 0,
        "response_format": {
            "type": "text"
        }
    }
    
    # Send POST request to the MaaS server
    response = requests.post(maas_server_url, headers=headers, json=data)

    # Process the MaaS response
    try:
        response_data = response.json()
        if 'choices' in response_data:
            result = response_data['choices'][0]['message']['content']
        else:
            print("Unexpected response structure:", response_data)
            result = "Unexpected response format received from server. Check server logs for details."
    except requests.exceptions.JSONDecodeError:
        print("Failed to parse the response from the server:", response.text)
        result = "Failed to parse the response from the server. Check server logs for more information."

    return result

# Perform root cause analysis
analysis = analyze_root_cause(llm, metrics_vectorstore, logs_vectorstore, anomalies)
print("Root Cause Analysis:")
print(analysis)

Root Cause Analysis:
 To perform a root cause analysis on the provided anomalies, let's first understand the context and the metrics involved.

1. Metrics:
   - Time: The date and time when the metrics were collected.
   - Call Attempt: The total number of calls made during the time period.
   - Call Success: The number of calls that were successfully completed.
   - Call Failure: The number of calls that failed during the time period.
   - Total Registered Subs: The total number of subscribers registered in the system.
   - Call Success Rate: The percentage of successful calls out of the total number of attempts.

2. Anomalies:
   - The anomalies data provides the time, call attempt, call success, call failure, total registered subs, and call success rate for four different time instances. The call success rate seems unusually high (98.24% and 98.23%) for the first two data points, but then it drastically increases (61.49%) for the fourth data point.

3. Logs:
   - The logs provided s