First Pass Prototype (6/22)

In [1]:
import time, os, glob
from dotenv import load_dotenv

# ibm genai
from genai.credentials import Credentials
from genai.model import Model
from genai.schemas import GenerateParams, ModelType

load_dotenv()
api_key = os.getenv("GENAI_KEY", None)
api_url = os.getenv("GENAI_API", None)
creds = Credentials(api_key, api_endpoint=api_url)

Reusable utilities

In [2]:
def get_completion_from_messages(model, user_input, prompt,
                                delimiter='###', verbose=True):

    '''
    Format a prompt with user_input and delimiter.
    Pass to model and get the output.
    '''

    message = prompt.format(delimiter=delimiter,
                                user_input=user_input)

    response = model.generate([message])[0].generated_text
                                    
    if verbose:
        print('### Input: ###')
        print(message)
        print('### Output: ###')
        print(response)

    return response

# Create initial model

In [3]:
GenerateParams?

[0;31mInit signature:[0m
[0mGenerateParams[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdecoding_method[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mLiteral[0m[0;34m[[0m[0;34m'greedy'[0m[0;34m,[0m [0;34m'sample'[0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlength_penalty[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mgenai[0m[0;34m.[0m[0mschemas[0m[0;34m.[0m[0mgenerate_params[0m[0;34m.[0m[0mLengthPenalty[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmax_new_tokens[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mint[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmin_new_tokens[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mint[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mrandom_seed[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mgena

In [4]:
params = GenerateParams(decoding_method="sample",
    max_new_tokens=50,
    min_new_tokens=1,
    stream=False,
    temperature=0.7,
    top_k=50,
    top_p=1,
    repetition_penalty=2,
    )
model = Model(ModelType.FLAN_UL2, params=params, credentials=creds)

## Determine which categories are relevant for a question

In [5]:
delimiter = "####"
prompt = """You will be provided with queries about people. \
The query will be delimited with \
{delimiter} characters.

Indicate which cagetories the query is relevant to among \
the following categories: Bank Account, Career/Education, Transactions.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

{delimiter}{user_input}{delimiter}

Your response here.
"""

In [6]:
user_input = f"""Tell me whether Bill Drinks coffee"""

response = get_completion_from_messages(model, user_input, prompt,
                                        delimiter, verbose=True)

### Input: ###
You will be provided with queries about people. The query will be delimited with #### characters.

Indicate which cagetories the query is relevant to among the following categories: Bank Account, Career/Education, Transactions.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

####Tell me whether Bill Drinks coffee####

Your response here.

### Output: ###
Transactions


In [7]:
user_input = f"""Tell me whether about Bob W's relationship with the bank."""

response = get_completion_from_messages(model, user_input, prompt,
                                        delimiter, verbose=True)

### Input: ###
You will be provided with queries about people. The query will be delimited with #### characters.

Indicate which cagetories the query is relevant to among the following categories: Bank Account, Career/Education, Transactions.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

####Tell me whether about Bob W's relationship with the bank.####

Your response here.

### Output: ###
Bank Account


In [8]:
user_input = f"""Tell me whether about Alice's eduation. Is she really so smart?"""

response = get_completion_from_messages(model, user_input, prompt,
                                        delimiter, verbose=True)

### Input: ###
You will be provided with queries about people. The query will be delimited with #### characters.

Indicate which cagetories the query is relevant to among the following categories: Bank Account, Career/Education, Transactions.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

####Tell me whether about Alice's eduation. Is she really so smart?####

Your response here.

### Output: ###
Career/Education


In [9]:
user_input = f"""Tell me whether about Alice's career progression and current bank balance."""

response = get_completion_from_messages(model, user_input, prompt,
                                        delimiter, verbose=True)

### Input: ###
You will be provided with queries about people. The query will be delimited with #### characters.

Indicate which cagetories the query is relevant to among the following categories: Bank Account, Career/Education, Transactions.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

####Tell me whether about Alice's career progression and current bank balance.####

Your response here.

### Output: ###
Bank Account, Career/Education


In [10]:
user_input = f"""Summarize the President's job history, banking stats, and spending history."""

response = get_completion_from_messages(model, user_input, prompt,
                                        delimiter, verbose=True)

### Input: ###
You will be provided with queries about people. The query will be delimited with #### characters.

Indicate which cagetories the query is relevant to among the following categories: Bank Account, Career/Education, Transactions.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

####Summarize the President's job history, banking stats, and spending history.####

Your response here.

### Output: ###
Transactions


## Given the relevant categories, read and inject data into the prompt

In [11]:
def client_qa(model, client, user_input, delimiter='###', verbose=True):
    '''
    Given llm, client name, and user question, present the appropriate data
    to the llm in order to garner an answer.
    '''

    # map of categories to data file names - can live outside this function
    master_categories = {'bank_facts.txt': 'Bank Account',
     'transactions.txt': 'Transactions', 
     'linkedin.txt': 'Career/Education'
    }

    master_categories_rev = {v:k for k,v in master_categories.items()}

    client = client.lower().replace(' ', '_')
    
    categories = glob.glob(f"data/{client}/*txt")
    categories = [os.path.basename(c) for c in categories]
    categories = [master_categories[c] for c in categories if master_categories.get(c, None)]
    print(categories)
    
    # Step 1: Determine the categories and apply to template
    # delimiter and user_input added later
    print(f'\nINFO: Step 1: Determine the right categories')

    category_prompt = """You will be provided with queries about people. \
The query will be delimited with \
{{delimiter}} characters.

Indicate which cagetories the query is relevant to among \
the following categories: {categories}.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

{{delimiter}}{{user_input}}{{delimiter}}

Your response here.
    """.format(categories=', '.join(sorted(categories)))

    response = get_completion_from_messages(model, user_input, category_prompt,
                                        delimiter, verbose=verbose)
    categories = response.split(', ')
    print(f'\nINFO: Using the following categories: {categories}')

    # Step 2: For each relevant category, load the data
    print('\nINFO: Entering Stage 2')
    reference_data = ''
    for c in categories:
        f = f'data/{client}/{master_categories_rev[c]}'
        with open(f, 'r') as fin:
            data = ''.join(fin.readlines())
            reference_data += f'{c}:\n{data}\n\n'
    
    # Step 3: Define the Q&A prompt and inject the reference data
    qa_prompt = """You will be provided with queries about people. \
The query will be delimited with \
{{delimiter}} characters.

Reference the following information when answering the question.
{reference_data}

{{delimiter}}{{user_input}}{{delimiter}}

Your response here.
    """.format(reference_data=reference_data)

    response = get_completion_from_messages(model, user_input, qa_prompt,
                                        delimiter, verbose=verbose)

    return response

In [20]:
# career and education
response = client_qa(model, 'richie mcrface', 
                     'Summarize Richie\'s education and work history in prose.',
                     verbose=True,
                    )

['Bank Account', 'Transactions', 'Career/Education']

INFO: Step 1: Determine the right categories
### Input: ###
You will be provided with queries about people. The query will be delimited with ### characters.

Indicate which cagetories the query is relevant to among the following categories: Bank Account, Career/Education, Transactions.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

###Summarize Richie's education and work history in prose.###

Your response here.
    
### Output: ###
Career/Education

INFO: Using the following categories: ['Career/Education']

INFO: Entering Stage 2
### Input: ###
You will be provided with queries about people. The query will be delimited with ### characters.

Reference the following information when answering the question.
Career/Education:
Richie currently works as an executive in banking.

Work Experience:
1. Big Shot @ Old Boys Bank (2005 - Present)
2. Intern @ Vater and Co

In [21]:
# test banking
response = client_qa(model, 'richie mcrface', 
                     'Summarize in prose Richie\'s banking relationship',
                     verbose=True,
                    )

['Bank Account', 'Transactions', 'Career/Education']

INFO: Step 1: Determine the right categories
### Input: ###
You will be provided with queries about people. The query will be delimited with ### characters.

Indicate which cagetories the query is relevant to among the following categories: Bank Account, Career/Education, Transactions.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

###Summarize in prose Richie's banking relationship###

Your response here.
    
### Output: ###
Bank Account

INFO: Using the following categories: ['Bank Account']

INFO: Entering Stage 2
### Input: ###
You will be provided with queries about people. The query will be delimited with ### characters.

Reference the following information when answering the question.
Bank Account:
deposits: $1.6M
loans: $20M
wealth: $0.5M




###Summarize in prose Richie's banking relationship###

Your response here.
    
### Output: ###
Richie has $1,

In [23]:
# test transactions
response = client_qa(model, 'richie mcrface', 
                     'What was the largest transation?',
                     verbose=True,
                    )

['Bank Account', 'Transactions', 'Career/Education']

INFO: Step 1: Determine the right categories
### Input: ###
You will be provided with queries about people. The query will be delimited with ### characters.

Indicate which cagetories the query is relevant to among the following categories: Bank Account, Career/Education, Transactions.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

###What was the largest transation?###

Your response here.
    
### Output: ###
Transactions

INFO: Using the following categories: ['Transactions']

INFO: Entering Stage 2
### Input: ###
You will be provided with queries about people. The query will be delimited with ### characters.

Reference the following information when answering the question.
Transactions:
This table shows incoming and outgoing transactions.  Negative amounts are outgoing. Positive amounts are incoming.
Amount Recipient_or_Sender
-5 Starbucks coffee
-100 Trans

In [26]:
# career and education
response = client_qa(model, 'richie mcrface', 
                     'Does Richie know very important people',
                     verbose=True,
                    )

['Bank Account', 'Transactions', 'Career/Education']

INFO: Step 1: Determine the right categories
### Input: ###
You will be provided with queries about people. The query will be delimited with ### characters.

Indicate which cagetories the query is relevant to among the following categories: Bank Account, Career/Education, Transactions.

Example Question: Tell me Bill's current position and deposit balance.
Example Output: Bank Account, Career/Education

###Does Richie know very important people###

Your response here.
    
### Output: ###
Transactions

INFO: Using the following categories: ['Transactions']

INFO: Entering Stage 2
### Input: ###
You will be provided with queries about people. The query will be delimited with ### characters.

Reference the following information when answering the question.
Transactions:
This table shows incoming and outgoing transactions.  Negative amounts are outgoing. Positive amounts are incoming.
Amount Recipient_or_Sender
-5 Starbucks coffee
-100