# Prediction Correction

- Given a prediction and an observation, does this observation or group of observations certify this prediction?

In [1]:
import os
import sys

import pandas as pd

from tqdm import tqdm

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
print(notebook_dir)
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from text_generation_models import TextGenerationModelFactory

/Users/detraviousjamaribrinkley/Documents/Development/research_labs/uf_ds/predictions/prediction_correctness_experiments


In [2]:
pd.set_option('max_colwidth', 800)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## Load Data

In [3]:
entailment_df = pd.read_csv('../data/entailment/entailment-cosine_similarity-v1.csv')  
entailment_df

Unnamed: 0,Prediction,Observation,Entailment Label,Spacy,SentenceTransformer,BERT
0,"Citigroup predicts on 2024-08-21, the operating income at Alphabet may rise.","In Q2 2025, a financial research advisor envisions that the research and development expenses at Google has some probability to remain stable.",NEUTRAL,0.773352,0.328824,0.899819
1,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"According to the financial top executive at BlackRock, the stock price at Amazon may rise in Q4 2028.",NEUTRAL,0.866826,0.64474,0.941268
2,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"In the fourth quarter of 2027, Wells Fargo envisions that the operating cash flow at Intel has some probability to remain stable.",NEUTRAL,0.865429,0.311582,0.896069
3,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"Intel stock price should stay the same in January 2028, according to a financial expert at Harvard University.",NEUTRAL,0.812578,0.321734,0.903572
4,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"On March 1, 2029, the financial advisor at Wells Fargo envisions that the inflation rate at the Federal Reserve has some probability to remain stable.",NEUTRAL,0.845318,0.282765,0.819832
5,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"On November 10, 2022, to November 10, 2023, Citigroup speculates the research and development expenses at Amazon will likely increase.",NEUTRAL,0.815297,0.490033,0.913402
6,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"The trading volume at Apple should stay same in the fourth quarter of 2025, according to a financial expert at JPMorgan Chase.",NEUTRAL,0.859225,0.489393,0.918903
7,"On August 21, 2024, Bank of America speculates the revenue at Microsoft will likely increase.","According to Bank of America, the net profit at Microsoft would fall in the second quarter of 2026.",NEUTRAL,0.927874,0.771567,0.91856
8,"On August 21, 2024, Bank of America speculates the revenue at Microsoft will likely increase.","Apple stock price decreased in August 2024, according to Roger.",NEUTRAL,0.876008,0.375553,0.888524
9,"On August 21, 2024, Bank of America speculates the revenue at Microsoft will likely increase.","Intel stock price should stay the same in January 2028, according to a financial expert at Harvard University.",NEUTRAL,0.900117,0.394057,0.874122


In [4]:
from json import loads, dumps
entailment_dict = entailment_df.to_json(orient='records')
entailment_dict = loads(entailment_dict)
print(dumps(entailment_dict, indent=4))

[
    {
        "Prediction": "Citigroup predicts on 2024-08-21, the operating income at Alphabet may rise.",
        "Observation": "In Q2 2025, a financial research advisor envisions that the research and development expenses at Google has some probability to remain stable.",
        "Entailment Label": "NEUTRAL",
        "Spacy": 0.77335185,
        "SentenceTransformer": 0.32882446,
        "BERT": 0.89981854
    },
    {
        "Prediction": "JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.",
        "Observation": "According to the financial top executive at BlackRock, the stock price at Amazon may rise in Q4 2028.",
        "Entailment Label": "NEUTRAL",
        "Spacy": 0.8668256,
        "SentenceTransformer": 0.64473987,
        "BERT": 0.9412681
    },
    {
        "Prediction": "JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.",
        "Observation": "In the fourth quarter of 2027, Wells 

## Properties of Both Prediction and Observations

In [5]:
prediction_properties = """A prediction <p> = (<p_s>, <p_t>, <p_d>, <p_o>), where it consists of the following four properties:

    1. <p_s>, any source entity in the domain.
        - Can be a person (with a name) or a  person such as a  reporter,  analyst,  expert,  top executive,  senior level person, etc), civilian.
        - Can only be an organization that is associated with the  prediction.
    2. <p_t>, any target entity in the  domain.
	    - Can be a person (with a name) or a  person such as a  reporter,  analyst,  expert,  top executive,  senior level person, etc).
        - Can only be an organization that is associated with the  prediction.
    3. <p_d>, date or time range when <p> is expected to come to fruition or when one should observe the <p>.
        - Forecast can range from a second to anytime in the future.
        - Answers the questions: "How far to go out from today?" or "Where to stop?".
    4. <p_o>,  prediction outcome.
        - Details relevant details such as outcome, a quantifiable metric, or slope.
"""

observation_properties = """An observation <o> = (<o_s>, <o_t>, <o_d>, <o_a>), where it consists of the following four properties:

    1. <o_s>, any source entity in the domain.
        - Can be a person (with a name) or a   person such as a   reporter,   analyst,   expert,   top executive,   senior level person, etc, civilian.
        - Can only be an organization that is associated with the   observation.
    2. <o_t>, any target entity in the   domain.
        - Can be a person (with a name) or a   person such as a   reporter,   analyst,   expert,   top executive,   senior level person, etc).
        - Can only be an organization that is associated with the   observation.
    3. <o_d>, date or time range when <o> is expected to come to fruition or when one should observe the <o>.
        - Forecast can range from a second to anytime in the future.
        - Answers the questions: "How far to go out from today?" or "Where to stop?".
    4. <o_a>,   observation output.
        - Characteristics of a domain-specific outputs such as various quantifiable metrics relevant to the   domain.
        - Some examples are {observation_domain_output}.
"""

## Models to Certify

In [6]:
tgmf = TextGenerationModelFactory()

# Groq Cloud (https://console.groq.com/docs/overview)
gemma_29b_generation_model = tgmf.create_instance('gemma2-9b-it') 
llama_318b_instant_generation_model = tgmf.create_instance('llama-3.1-8b-instant') 
llama_3370b_versatile_generation_model = tgmf.create_instance('llama-3.3-70b-versatile')  
llama_guard_4_12b_generation_model = tgmf.create_instance('meta-llama/llama-guard-4-12b')  

models = [gemma_29b_generation_model, llama_318b_instant_generation_model, llama_3370b_versatile_generation_model, llama_guard_4_12b_generation_model]

## Certify

In [7]:
def llm_certifier(data: list, prediction_properties: str, observation_properties: str):
    
    preds_obs_labels = []
    for record in tqdm(entailment_dict):
        prediction = record['Prediction']
        observation = record['Observation']
        entailment_label = record['Entailment Label']
        spacy_score = record['Spacy']
        sentence_transformer_score = record['SentenceTransformer']
        bert_score = record['BERT']

        

        prompt = f"We have {prediction_properties} and {observation_properties}. The main difference at the moment is a prediction is future tense and observation is past tense. Return if the observation ({observation}) certifies this prediction ({prediction}). Take into account entailment label ({entailment_label}) using roberta-large-mnli, embeddings (of prediction and observation) measuring cosine similarity score (between prediction and observation using sklearn.metrics.pairwise.cosine_similarity): ({spacy_score}) using Spacy, ({sentence_transformer_score}) using all-MiniLM-L6-v2, and ({bert_score}) using bert-base-uncased. The return label MUST be certify, not certify, or neither and nothing more (as in do not state why)." 
        # print(f"Prompt: {prompt}")
        # print()

        for model in models:  
            input_prompt = model.user(prompt)
            # print(input_prompt)  
            
            raw_text_llm_generation = model.chat_completion([input_prompt])
            # print(raw_text_llm_generation)

            for line in raw_text_llm_generation.split("\n"):
                # print(line)
                if line.strip():
                    pred_ob_label = (prediction, observation, line, model.__name__())
            preds_obs_labels.append(pred_ob_label)
    return preds_obs_labels

In [8]:
certifier_predictions = llm_certifier(entailment_dict, prediction_properties, observation_properties)
certifier_predictions

100%|██████████| 12/12 [00:48<00:00,  4.04s/it]


[('Citigroup predicts on 2024-08-21, the operating income at Alphabet may rise.',
  'In Q2 2025, a financial research advisor envisions that the research and development expenses at Google has some probability to remain stable.',
  'certify ',
  'gemma2-9b-it'),
 ('Citigroup predicts on 2024-08-21, the operating income at Alphabet may rise.',
  'In Q2 2025, a financial research advisor envisions that the research and development expenses at Google has some probability to remain stable.',
  'certify',
  'llama-3.1-8b-instant'),
 ('Citigroup predicts on 2024-08-21, the operating income at Alphabet may rise.',
  'In Q2 2025, a financial research advisor envisions that the research and development expenses at Google has some probability to remain stable.',
  'not certify',
  'llama-3.3-70b-versatile'),
 ('Citigroup predicts on 2024-08-21, the operating income at Alphabet may rise.',
  'In Q2 2025, a financial research advisor envisions that the research and development expenses at Google h

In [9]:
df = pd.DataFrame(certifier_predictions, columns=['Prediction Sentence', 'Observation Sentence', 'Label', 'Model'])
df


Unnamed: 0,Prediction Sentence,Observation Sentence,Label,Model
0,"Citigroup predicts on 2024-08-21, the operating income at Alphabet may rise.","In Q2 2025, a financial research advisor envisions that the research and development expenses at Google has some probability to remain stable.",certify,gemma2-9b-it
1,"Citigroup predicts on 2024-08-21, the operating income at Alphabet may rise.","In Q2 2025, a financial research advisor envisions that the research and development expenses at Google has some probability to remain stable.",certify,llama-3.1-8b-instant
2,"Citigroup predicts on 2024-08-21, the operating income at Alphabet may rise.","In Q2 2025, a financial research advisor envisions that the research and development expenses at Google has some probability to remain stable.",not certify,llama-3.3-70b-versatile
3,"Citigroup predicts on 2024-08-21, the operating income at Alphabet may rise.","In Q2 2025, a financial research advisor envisions that the research and development expenses at Google has some probability to remain stable.",safe,meta-llama/llama-guard-4-12b
4,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"According to the financial top executive at BlackRock, the stock price at Amazon may rise in Q4 2028.",certify,gemma2-9b-it
5,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"According to the financial top executive at BlackRock, the stock price at Amazon may rise in Q4 2028.","Therefore, the entailment label for the observation and prediction is ""certify"".",llama-3.1-8b-instant
6,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"According to the financial top executive at BlackRock, the stock price at Amazon may rise in Q4 2028.",not certify,llama-3.3-70b-versatile
7,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"According to the financial top executive at BlackRock, the stock price at Amazon may rise in Q4 2028.",S6,meta-llama/llama-guard-4-12b
8,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"In the fourth quarter of 2027, Wells Fargo envisions that the operating cash flow at Intel has some probability to remain stable.",certify,gemma2-9b-it
9,JPMorgan Chase forecasts that the net profit at Amazon potentially decrease in Q3 of 2027.,"In the fourth quarter of 2027, Wells Fargo envisions that the operating cash flow at Intel has some probability to remain stable.","Therefore, the return label is: certify",llama-3.1-8b-instant
