# Background

The purpose of this notebook is to iterate on the prompt for Zeus (GenAI Agent)
- Reference doc here: https://moneylion.atlassian.net/wiki/spaces/MLS/pages/3855679489/WIP+GenAI+Agent+Zeus

# Setup

In [20]:
import os
import re
import sys
import json
import requests
import numpy as np
import pandas as pd
from openai import OpenAI
from dotenv import find_dotenv, load_dotenv
import seaborn as sns
import matplotlib as plt

from pathlib import Path
from collections import Counter

from datetime import datetime, date, timedelta
from dateutil import parser

In [21]:
# set pandas view option
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)
# pd.set_option("display.max_rows", 100)

In [22]:
# Retrieve OPENAI API KEY stored in your .env file
load_dotenv(find_dotenv())
client = OpenAI(
  api_key=os.environ['OPENAI_API_KEY'],  # this is also the default, it can be omitted
)

# Zeus Prompt Testing

## Prompt R&D

In [23]:
from string import Template

class PromptTemplate:
    def __init__(self):
        self.MODULE_SELECTOR = Template(MODULE_SELECTOR)

    def module_selector(self, user_question):
        return self.MODULE_SELECTOR.substitute(user_question=user_question)

In [24]:
# MODULE_SELECTOR =   """You are ModuleGPT, an AI designed to autonomously route user request to the proper module for processing.
# Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.

# GOALS:
# 1. Determine if the user's input is comprehensible ebough to determine which module to route the user's request to.  
# 2. Route the user's request to the proper module for processing, depending on the user's request.

# EXPECTED INPUT:
# The user will provide an input in natural language. Note that the user may not always provide complete and comprehensible words.
# The input will determine what the user wants to know about. The input may be a question or a word or a statement.

# EXPECTED OUTPUT:
# Module is a function that takes in a user input and returns a response.
# Using the modules listed below, you must determine the proper module to route the user's request to. You must also provide a reasoning for your decision.

# MODULES:
# 1. bts
#     Purpose: Generate personalized responses based on user's bank transactional data.
#     Output format: 
#     {
#         "answer": {
#             "module": "bts",
#             "reasoning": "<reasons to support the module selection>",
#             "args": {"input": "<user_question>"}
#     } 
# 2. encyclopedic
#     Purpose: Generate general factual responses that can reply to the user on more general input.
#     Output format:
#     {
#         "answer": {
#             "module": "encyclopedic",
#             "reasoning": "<reasons to support the module selection>",
#             "args": {
#                 "input": "<user_question>",
#                 "product_flag": "1 if <user_question> contains references to MoneyLion products like Instacash, Credit Builder Plus (CB+) or RoarMoney, else 0",
#                 "valid": "1 if <user_question> makes sense when put into the following sentence : What is <user_question> ? or How to <user_question>? , else 0"
#             }
#     } 

# STRICT CONSTRAINTS:
# 1. No user assistance. Do not make assumptions about the user's input. Do not ask the user for more information.
# 2. Exclusively use the modules listed in double quotes e.g. "module name". Do not use any other modules.
# 3. Never use any other modules not listed.  
# 4. You should only respond in JSON format as shown in the "EXPECTED OUTPUT" section above based on the module you selected.
# 5. If the user's input is not comprehensible enough or does not provide enough context to determine which module to route the user's request to, use encyclopedic module.
# 6. Use bts module if the intention of the user's input is to ask about a statement of their bank transactions.
# 7. Use encyclopedic module if the intention of the user's input is to ask about general information about <user_question>.
# 8  When using encyclopedic module, check if the user's input contains references to MoneyLion products like Instacash, Credit Builder Plus (CB+) or RoarMoney. If it does, set the "product_flag" argument to 1, else 0.
# 9. When using encyclopedic module, check if the user's input makes sense when put into the following sentence : What is <user_question> ? What is <user_question> ? or How to <user_question>?,  If it does, set the "valid" argument to 1, else 0.
    
# PERFORMANCE EVALUATION:
# 1. Route the user to the appropriate module that can answer the user question.
# 2. Whenever it is possible to be specific, be specific and use specific module. Avoid using module that gives more general response if a more specific module is available.
# 3. Evaluate the correct responses within the module args
# 4. Never use any other modules not listed.  

# User question: $user_question
# """


In [25]:
MODULE_SELECTOR =   """You are ModuleGPT, an AI designed to autonomously route user request to the proper module for processing.
Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.

GOALS:
1. Determine if the user's input is comprehensible ebough to determine which module to route the user's request to.  
2. Route the user's request to the proper module for processing, depending on the user's request.

EXPECTED INPUT:
The user will provide an input in natural language. Note that the user may not always provide complete and comprehensible words.
The input will determine what the user wants to know about. The input may be a question or a word or a statement.

EXPECTED OUTPUT:
Module is a function that takes in a user input and returns a response.
Using the modules listed below, you must determine the proper module to route the user's request to. You must also provide a reasoning for your decision.

MODULES:
1. bts
    Purpose: Generate personalized responses based on user's bank transactional data.
    Output format: 
    {
        "answer": {
            "module": "bts",
            "reasoning": "<reasons to support the module selection>",
            "args": {"input": "<user_question>"}
    } 
2. encyclopedic
    Purpose: Generate general factual responses that can reply to the user on more general input.
    Output format:
    {
        "answer": {
            "module": "encyclopedic",
            "reasoning": "<reasons to support the module selection>",
            "args": {
                "input": "<user_question>",
                "product_flag": "1 if <user_question> contains references to MoneyLion products like Instacash, Credit Builder Plus (CB+) or RoarMoney, else 0",
            }
    } 

RULES:
1. No user assistance. Do not make assumptions about the user's input. Do not ask the user for more information.
2. Exclusively use the modules listed in double quotes e.g. "module name". Do not use any other modules.
3. Never use any other modules not listed.  
4. You should only respond in JSON format as shown in the "EXPECTED OUTPUT" section above based on the module you selected.
5. Follow the strict constraints listed below carefully.

STRICT CONSTRAINTS:
1. You must use the encyclopedic module if the user's input is a single word or a single number.
2. Only use bts module if the intention of the user's input is a inquiry to ask about a statement of their bank transactions.
3. Only use encyclopedic module if the intention of the user's input is to ask about general information about <user_question>.
4.  When using encyclopedic module, check if the user's input contains references to MoneyLion products like Instacash, Credit Builder Plus (CB+) or RoarMoney. If it does, set the "product_flag" argument to 1, else 0.
    
PERFORMANCE EVALUATION:
1. Route the user to the appropriate module that can answer the user question.
2. Whenever it is possible to be specific, be specific and use specific module. Avoid using module that gives more general response if a more specific module is available.
3. Evaluate the correct responses within the module args
4. Never use any other modules not listed.  

User question: $user_question
"""


In [26]:

        # "product_flag": "true or false, whether <user_question> is asking about a MoneyLion product - Instacash, Credit Builder Plus (CB+) or RoarMoney",
        # "complete_query": "true if <user_question> is an incomplete phrase or sentence, else false",

In [27]:

    # "thoughts": {
    #     "text": "module"
    #     "reasoning": "reasoning",
    # },

In [28]:
# {
#     "thoughts": {
#         "text": "thought",
#         "reasoning": "reasoning",
#         "plan": "- short bulleted\n- list that conveys\n- long-term plan",
#         "criticism": "constructive self-criticism",
#         "speak": "thoughts summary to say to user",
#     },
#     "command": {"name": "command name", "args": {"arg name": "value"}},
# }

In [29]:
def get_gpt_response(query):

    user_prompt = query
    client = OpenAI()

    response = client.chat.completions.create(
      model="gpt-3.5-turbo",
      messages=[
        #{"role": "system", "content": f"{system_prompt}"},
        {"role": "user", "content": f"{user_prompt}"},
      ]
    )
    msg_output = response.choices[0].message.content
    return msg_output

In [30]:
pt = PromptTemplate()

In [31]:
json.loads(get_gpt_response(pt.module_selector(user_question = 'Direct deposit')))

{'answer': {'module': 'encyclopedic',
  'reasoning': "The user's input 'Direct deposit' is a general question about a financial term. Therefore, it should be routed to the encyclopedic module.",
  'args': {'input': 'Direct deposit', 'product_flag': 0}}}

In [32]:
json.loads(get_gpt_response(pt.module_selector(user_question = 'How much did I spend last week?')))

{'answer': {'module': 'bts',
  'reasoning': "The user's question 'How much did I spend last week?' is asking about a statement of their bank transactions, which matches the purpose of the 'bts' module. Therefore, I will route the user's request to the 'bts' module for processing.",
  'args': {'input': 'How much did I spend last week?'}}}

In [33]:
a = json.loads(get_gpt_response(pt.module_selector(user_question = 'Add')))
a

{'answer': {'module': 'encyclopedic',
  'reasoning': "The user's input is a single word, which falls under the scope of the encyclopedic module.",
  'args': {'input': 'Add', 'product_flag': 0}}}

In [34]:
for key, value in a.items() :
    print (key, value)

answer {'module': 'encyclopedic', 'reasoning': "The user's input is a single word, which falls under the scope of the encyclopedic module.", 'args': {'input': 'Add', 'product_flag': 0}}


In [35]:
b = a.get('answer')
b.get('args')

{'input': 'Add', 'product_flag': 0}

## Prompt Evaluation

### Prepare Test Data

For test data, we'll use the following:
- 50 top questions from BTS
- 50 top searches from Universal Search

We expect the BTS queries to be routed to that module, and Universal Search queries to be routed to the Encyclopedic module

We will also evaluate the availability & correctness of the arguments in the payloads

In [None]:
from snowflake import connector as sf
import pandas as pd

def load_snowflake_data(query):
    connection = sf.connect(
        user= 'xlim@moneylion.com', # replace with your moneylion email.
        account='moneylion.us-east-1',
        authenticator='externalbrowser',
        database='ANALYTICS_DB',
        warehouse='PROD_ANALYST_READ_WH',
        role='PROD_A_USR_XLIM_WRITER' # replace with your SF role 
    ) 

    connection.autocommit = True

    df = pd.read_sql_query(query, connection)

    connection.close()
    
    return df

In [None]:
search_query = """
        select
            input, 
            count(*) as count,
            row_number() over ( order by count desc) as rank
        from prod.srch_query_execute
        where original_timestamp >= '2023-08-17'
        and category = 'TOP'
        and user_id__hashed not in (select distinct email__hashed from user.base_ml_employee)
        and user_id__hashed not in (select distinct email__hashed from dbt_fraud.base_even_employees)
        group by 1
        order by 2 desc
        limit 100
    """

# querying extra to cater for exclusion of suggestions & potential conditions (i.e. min search length, increasing # queries)
top_user_queries_df = load_snowflake_data(query=search_query)

Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
Going to open: https://sso.jumpcloud.com/saml2/snowflakeai?SAMLRequest=lZJdT9swFIb%2FSuRdJ3YCaK3VFIV2aNH6JVo2yp1x3GLq2JmPTYBfP6cfE7sAaXeW%2Fb5%2BzjnvGVy%2B1Cp6Fhak0TlKE4IiobmppN7m6HZ1HfdQBI7piimjRY5eBaDL4QBYrRpaePeob8RvL8BF4SMNtHvIkbeaGgYSqGa1AOo4XRbTCc0SQhmAsC7g0NFSgQysR%2BcainHbtkl7lhi7xRkhBJM%2BDqpO8gW9QzSfMxprnOFGnSwvoacPECkm5x0iKAJhcTReSX0YwWeUh4MI6PfVahEv5ssViopTdyOjwdfCLoV9llzc3kwOBUCoYDqffVtPyvks8RALBi5OE9Cm3Si2E9zUjXfh3ySc8EZUWJmtDNMqxzlqdrK6G9WyfNhNp%2FNfvt%2BW%2Fbe7omh%2FcLY%2Bf5uxPoMnMuFX623v%2FoJzFP08ZZt12ZYAXpS6S9SFK5KdxWkaZ2RFMnrxlWa9pJf17lE0DolKzdzeeSobwCRPvm64Mr7aV9dFkeG%2FpTOJDqtB9xw7%2FP%2BGB%2Fi9%2F7hnszD6crwwSvLX6NrYmrmPk0mTdH8jq3izl1JRM6mKqrICICSklGlHVjAX1tlZLxAeHqj%2FLvTwDw%3D%3D&RelayState=55207 to authenticate...


  df = pd.read_sql_query(query, connection)


In [None]:
top_user_queries_df.head()

Unnamed: 0,INPUT,COUNT,RANK
0,Roar,346,1
1,Instacash,266,2
2,Credit,206,3
3,Boost,193,4
4,Insta,177,5


In [None]:
top_user_queries_df['source'] = 'Search'
top_user_queries_df.head()

Unnamed: 0,INPUT,COUNT,RANK,source
0,Roar,346,1,Search
1,Instacash,266,2,Search
2,Credit,206,3,Search
3,Boost,193,4,Search
4,Insta,177,5,Search


In [None]:
top_user_queries_df.to_csv('top_user_queries.csv', index=False)

In [None]:
bts_query = """
        select question, count(*)
        from insights.insight_transaction_inquiry_history
        where flow = 'Question Answer'
        and created_at >= '2023-10-01'
        group by 1
        order by 2 desc
        limit 50
    """

top_bts_queries_df = load_snowflake_data(query=bts_query)

Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
Going to open: https://sso.jumpcloud.com/saml2/snowflakeai?SAMLRequest=lZJbT%2BMwEIX%2FSuR9TuxcpG6ttqhQIrr0kqUFwb55E7c1OHbW4zSFX79O2q7YB5B4s%2Bxz%2FM3MmcHFoZTenhsQWg1RGBDkcZXrQqjtEN2vU%2F878sAyVTCpFR%2BiVw7oYjQAVsqKjmu7U3f8T83Beu4jBbR9GKLaKKoZCKCKlRyozelqPJ%2FRKCCUAXBjHQ6dLAUIx9pZW1GMm6YJmjjQZosjQggmfexUreQbeoeoPmdURluda3m2HFxPHyBCTJIW4RSOkJ2Ml0IdR%2FAZ5fdRBPRmvc78bLlaI2987u5KK6hLblbc7EXO7%2B9mxwLAVTBfLq6fZtPlIqjB5wysHwagdLOR7IXnuqxq6%2F4N3AlveIGl3go3relkiKoXUcj48Pi2z9MnSG8Ot%2FLnJH1%2B%2FCHJvmcPLLu85f1dWsyv%2B70qSXLkPZyzjdpspwA1n6o2UeuuSBT7YehHZE1iGvZokgRRHP9C3sQlKhSznfNcNoAOnuuyyqWui666NooI%2FyudCXRcDdpxzOjrDQ%2Fwe%2F9pzxZu9NNJpqXIX71Um5LZj5MJg7C7EYW%2F6aSUl0zIcVEYDuASklI3V4Yz69bZmpojPDpS%2F1%2Fo0V8%3D&RelayState=55628 to authenticate...


  df = pd.read_sql_query(query, connection)


In [None]:
top_bts_queries_df.head()

Unnamed: 0,QUESTION,COUNT(*)
0,What are all my transactions above $100 in the past 2 weeks?,928
1,What are all my transactions above $50 in the past week?,799
2,How much is my largest spend?,793
3,What are my largest 3 spend categories last week?,759
4,Show me my recurring spend,747


In [None]:
top_bts_queries_df['source'] = 'BTS'
top_bts_queries_df = top_bts_queries_df.rename(columns={'QUESTION': 'INPUT'})
top_bts_queries_df.head()

Unnamed: 0,INPUT,COUNT(*),source
0,What are all my transactions above $100 in the past 2 weeks?,928,BTS
1,What are all my transactions above $50 in the past week?,799,BTS
2,How much is my largest spend?,793,BTS
3,What are my largest 3 spend categories last week?,759,BTS
4,Show me my recurring spend,747,BTS


In [None]:
eval_questions = pd.concat([top_user_queries_df[['INPUT', 'source']], top_bts_queries_df[['INPUT', 'source']]]).reset_index(drop=True)
eval_questions.head()

Unnamed: 0,INPUT,source
0,Roar,Search
1,Instacash,Search
2,Credit,Search
3,Boost,Search
4,Insta,Search


In [None]:
eval_questions.to_csv('../input/eval_questions.csv', index=False)

In [37]:
#csv to df
eval_questions = pd.read_csv('../../input/eval_questions.csv')

#shuffle the df
eval_questions = eval_questions.sample(frac=1).reset_index(drop=True)
eval_questions.head()


Unnamed: 0,INPUT,source
0,What are my reoccurring purchases?,BTS
1,Sneaker heads,Search
2,Customer,Search
3,Where did I spend the most money in last week?,BTS
4,What is my most frequent spend category?,BTS


In [38]:
eval_questions.tail(30)

Unnamed: 0,INPUT,source
85,How does Instacash work?,Search
86,Where is my direct deposit,BTS
87,What are my largest 3 spend categories last week?,BTS
88,Debit,Search
89,Break down my spending last week by merchants,BTS
90,Credit,Search
91,Credit builder plus,Search
92,In,Search
93,Improve credit score,Search
94,Direct deposit,Search


In [39]:
len(eval_questions)

115

In [40]:
import time
output = pd.DataFrame()

for i, j in eval_questions.iterrows():
    while True:
        try:
            t0 = time.time()
            gpt_output = json.loads(get_gpt_response(pt.module_selector(user_question = j.iloc[0])))
            t1 = time.time()
            output = pd.concat([output, pd.concat([pd.json_normalize(gpt_output['answer'],
                                                                     meta=['module', 'reasoning', ['input', 'product_flag', 'valid']]), pd.Series(t1-t0, name='time')], axis=1)], axis=0)
            t1 = time.time()
            print(t1 - t0)
            print(f"Index {i} Complete")
            break

        except:
            print(f"Retry Index {i}")
            pass

output = output.reset_index(drop=True)
output.head()

2.529114007949829
Index 0 Complete
2.7640199661254883
Index 1 Complete
2.495738983154297
Index 2 Complete
2.8557839393615723
Index 3 Complete
2.2845890522003174
Index 4 Complete
2.0723319053649902
Index 5 Complete
2.5064101219177246
Index 6 Complete
2.153704881668091
Index 7 Complete
2.550673246383667
Index 8 Complete
1.972581148147583
Index 9 Complete
2.437551975250244
Index 10 Complete
2.3489601612091064
Index 11 Complete
2.4596750736236572
Index 12 Complete
2.909653902053833
Index 13 Complete
2.105882167816162
Index 14 Complete
2.4594428539276123
Index 15 Complete
2.6232268810272217
Index 16 Complete
2.5984840393066406
Index 17 Complete
2.870447874069214
Index 18 Complete
2.698029041290283
Index 19 Complete
2.5269761085510254
Index 20 Complete
2.2459731101989746
Index 21 Complete
2.4560468196868896
Index 22 Complete
2.409385919570923
Index 23 Complete
2.7116658687591553
Index 24 Complete
2.1027913093566895
Index 25 Complete
3.218040943145752
Index 26 Complete
2.2567570209503174
Inde

Unnamed: 0,module,reasoning,args.input,time,args.product_flag
0,bts,"The user's question is asking about their recurring purchases, which is related to their bank transactions. Therefore, the bts module should be used to provide a personalized response based on the user's bank transactional data.",What are my reoccurring purchases?,2.526612,
1,encyclopedic,"The user's input is a single word. Therefore, it is more likely that the user is seeking general information about 'sneaker heads'. Hence, the encyclopedic module should be used.",Sneaker heads,2.76013,0.0
2,encyclopedic,"The user's input is a single word, indicating that they might be looking for general information about 'customer'. Therefore, the encyclopedic module is the appropriate module to use.",Customer,2.492249,0.0
3,bts,The user's question is related to a specific statement about their bank transactions (spending the most money in the last week). Using the bts module will allow us to generate a personalized response based on the user's transactional data.,Where did I spend the most money in last week?,2.852065,
4,bts,The user is asking about their bank transactions and their most frequent spend category. This falls under the scope of the bts module.,What is my most frequent spend category?,2.281211,


In [42]:
output.tail(30)

Unnamed: 0,module,reasoning,args.input,time,args.product_flag
85,encyclopedic,"The user question contains a reference to a MoneyLion product, which is Instacash. Therefore, the encyclopedic module should be used to provide a general explanation of how Instacash works.",How does Instacash work?,2.618647,1.0
86,bts,"The user's input 'Where is my direct deposit' indicates that they are asking about the location or status of their direct deposit, which is related to their bank transactions. Therefore, the 'bts' module should be used to provide a personalized response based on the user's bank transactional data.",Where is my direct deposit,2.584871,
87,bts,"The user's question is asking about their bank transactions, specifically the largest spend categories. Therefore, the bts module should be used to generate a personalized response based on the user's transactional data.",What are my largest 3 spend categories last week?,2.477003,
88,encyclopedic,"The user's input is a single word, indicating a general question. Therefore, the encyclopedic module should be used to provide a general factual response.",Debit,3.05189,0.0
89,bts,"The user's input is asking for a breakdown of their spending last week by merchants. This is a specific inquiry that can be answered using the bank transactional data. Therefore, the bts module is the appropriate module to route the user's request to.",Break down my spending last week by merchants,2.849718,
90,encyclopedic,"The user's input is a single word, which suggests that they are asking for general information about 'credit'. Therefore, the encyclopedic module is the best choice.",Credit,2.47103,1.0
91,encyclopedic,"The user's input 'Credit builder plus' contains a reference to the MoneyLion product 'Credit Builder Plus (CB+)', so the encyclopedic module should be used. The product_flag should be set to 1.",Credit builder plus,2.754705,1.0
92,encyclopedic,"The user's input is a single word, which indicates a general inquiry about something. Therefore, the encyclopedic module is the most appropriate choice.",In,2.351676,0.0
93,encyclopedic,"The user's input is a statement about improving their credit score, which falls under the category of general information. Therefore, the encyclopedic module is the appropriate choice.",Improve credit score,2.544926,0.0
94,encyclopedic,"The user's input 'Direct deposit' is a two-word phrase, which indicates a general question about direct deposit. Therefore, the encyclopedic module is the best module to provide a general factual response.",Direct deposit,2.446716,0.0


In [43]:
output.to_csv('../../output/zeus_test_data_mvp.csv')

# Evaluating Autocomplete Results

In [45]:
zeus_results = pd.read_csv('../../input/zeus_results_labelled.csv')
zeus_results

Unnamed: 0.1,Unnamed: 0,module,target_module,reasoning,args.input,time,args.product_flag,target_product_flag
0,0,bts,bts,"The user's question is asking about their recurring purchases, which is related to their bank transactions. Therefore, the bts module should be used to provide a personalized response based on the user's bank transactional data.",What are my reoccurring purchases?,2.53,,
1,3,bts,bts,The user's question is related to a specific statement about their bank transactions (spending the most money in the last week). Using the bts module will allow us to generate a personalized response based on the user's transactional data.,Where did I spend the most money in last week?,2.85,,
2,4,bts,bts,The user is asking about their bank transactions and their most frequent spend category. This falls under the scope of the bts module.,What is my most frequent spend category?,2.28,,
3,9,bts,bts,"The user's input is inquiring about a specific aspect of their bank transactions, which is to break down their spending by category.",Break down my spending by category,1.97,,
4,13,bts,bts,"The user's question is asking for the amount spent on a specific category of transactions (Starbucks) within a specific time frame (this week). This indicates that the user wants information about their bank transactions. Hence, the bts module should be used.",How much did I spend on Starbucks this week?,2.91,,
...,...,...,...,...,...,...,...,...
110,107,encyclopedic,encyclopedic,"The user's input is a single word, 'Account', which falls under the category of general information. Therefore, I will route the user's request to the encyclopedic module.",Account,2.76,0.0,0.0
111,108,encyclopedic,encyclopedic,"The user's input is a single word, which falls under the criteria for using the encyclopedic module. The encyclopedic module can provide general information about various topics, including the word 'Car'.",Car,2.86,0.0,0.0
112,110,encyclopedic,encyclopedic,"The user's input 'Virtual card' is a two-word phrase, so it falls under the category of general information. Therefore, the encyclopedic module should be used to answer the user's question.",Virtual card,2.45,0.0,0.0
113,111,encyclopedic,encyclopedic,"The user's input is a single word, which indicates that they are asking for general information about 'defer'. Therefore, the encyclopedic module should be used.",Defer,2.37,0.0,0.0


## Module selection results

In [46]:
from sklearn import metrics

In [47]:
from sklearn.metrics import (
    confusion_matrix,
    accuracy_score,
    precision_score,
    recall_score,
    f1_score
)

true_labels = zeus_results['target_module']
predicted_labels = zeus_results['module']

In [48]:
# Confusion matrix
conf_matrix = confusion_matrix(true_labels, predicted_labels)
print("Confusion Matrix:")
print(conf_matrix)

# Accuracy
accuracy = accuracy_score(true_labels, predicted_labels)
print("Accuracy:", accuracy)

# Precision
precision = precision_score(true_labels, predicted_labels, average='weighted')
print("Precision:", precision)

# Recall
recall = recall_score(true_labels, predicted_labels, average='weighted')
print("Recall:", recall)

# F1 Score
f1 = f1_score(true_labels, predicted_labels, average='weighted')
print("F1 Score:", f1)

Confusion Matrix:
[[39  0]
 [ 1 75]]
Accuracy: 0.991304347826087
Precision: 0.9915217391304348
Recall: 0.991304347826087
F1 Score: 0.9913305900490949


## Encyclopedic args results

In [49]:
# filter zeus_results df to only module = encyclopedic
zeus_results_encyclopedic = zeus_results[zeus_results['module'] == 'encyclopedic']
zeus_results_encyclopedic 

Unnamed: 0.1,Unnamed: 0,module,target_module,reasoning,args.input,time,args.product_flag,target_product_flag
40,17,encyclopedic,encyclopedic,"The user's input is a two-word phrase, which falls under the category of general information. Therefore, I will route the user's request to the encyclopedic module.",Cash advance,2.59,1.0,1.0
41,26,encyclopedic,encyclopedic,"The user's input, 'Credit builder', is a single word which makes it suitable for the encyclopedic module. It is likely that the user wants general information about credit builder, so the encyclopedic module should be used.",Credit builder,3.21,1.0,1.0
42,48,encyclopedic,encyclopedic,"The user's input contains a reference to a MoneyLion product (RoarMoney). Therefore, we can use the encyclopedic module to provide general information about RoarMoney.",RoarMoney,2.53,1.0,1.0
43,55,encyclopedic,encyclopedic,"The user's input 'Credit Builder' is a single word, indicating that they are looking for general information. Therefore, the encyclopedic module should be used.",Credit Builder,2.51,1.0,1.0
44,58,encyclopedic,encyclopedic,"The user's input contains a reference to a MoneyLion product, which is 'RoarMoney'. Therefore, the encyclopedic module should be used to provide general information about RoarMoney.",Maximize your savings with RoarMoney,2.74,1.0,1.0
...,...,...,...,...,...,...,...,...
110,107,encyclopedic,encyclopedic,"The user's input is a single word, 'Account', which falls under the category of general information. Therefore, I will route the user's request to the encyclopedic module.",Account,2.76,0.0,0.0
111,108,encyclopedic,encyclopedic,"The user's input is a single word, which falls under the criteria for using the encyclopedic module. The encyclopedic module can provide general information about various topics, including the word 'Car'.",Car,2.86,0.0,0.0
112,110,encyclopedic,encyclopedic,"The user's input 'Virtual card' is a two-word phrase, so it falls under the category of general information. Therefore, the encyclopedic module should be used to answer the user's question.",Virtual card,2.45,0.0,0.0
113,111,encyclopedic,encyclopedic,"The user's input is a single word, which indicates that they are asking for general information about 'defer'. Therefore, the encyclopedic module should be used.",Defer,2.37,0.0,0.0


### Extra evaluation for encyclopedic args

#### product detection

In [51]:
true_labels = zeus_results_encyclopedic['target_product_flag']
predicted_labels = zeus_results_encyclopedic['args.product_flag']

# Confusion matrix
conf_matrix = confusion_matrix(true_labels, predicted_labels)
print("Confusion Matrix:")
print(conf_matrix)

# Accuracy
accuracy = accuracy_score(true_labels, predicted_labels)
print("Accuracy:", accuracy)

# Precision
precision = precision_score(true_labels, predicted_labels, average='weighted')
print("Precision:", precision)

# Recall
recall = recall_score(true_labels, predicted_labels, average='weighted')
print("Recall:", recall)

# F1 Score
f1 = f1_score(true_labels, predicted_labels, average='weighted')
print("F1 Score:", f1)

Confusion Matrix:
[[62  1]
 [ 1 11]]
Accuracy: 0.9733333333333334
Precision: 0.9733333333333334
Recall: 0.9733333333333334
F1 Score: 0.9733333333333334


#### input validation

In [95]:
# true_labels = zeus_results_encyclopedic['target_valid']
# predicted_labels = zeus_results_encyclopedic['args.valid']

# # Confusion matrix
# conf_matrix = confusion_matrix(true_labels, predicted_labels)
# print("Confusion Matrix:")
# print(conf_matrix)

# # Accuracy
# accuracy = accuracy_score(true_labels, predicted_labels)
# print("Accuracy:", accuracy)

# # Precision
# precision = precision_score(true_labels, predicted_labels, average='weighted')
# print("Precision:", precision)

# # Recall
# recall = recall_score(true_labels, predicted_labels, average='weighted')
# print("Recall:", recall)

# # F1 Score
# f1 = f1_score(true_labels, predicted_labels, average='weighted')
# print("F1 Score:", f1)

Confusion Matrix:
[[11  2]
 [31 28]]
Accuracy: 0.5416666666666666
Precision: 0.8121031746031746
Recall: 0.5416666666666666
F1 Score: 0.5878277153558052
