# 1-Generate Predictions

- **Goal:** Prediction Recognition

- **Purpose:** To implement step 1 with sub steps of prediction recognition pipeline. See steps
    1. Generate predictions
        1. Create several prediction prompts templates
        2. Utilize open-source LLMs to generate predictions

- **Misc:**
    - `%store`: Cell magic will store the variable of interest so we can load in another notebook

In [1]:
import os
import sys

import pandas as pd

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from pipelines import BasePipeline

In [2]:
prediction_template = """My variables are
- $\hat{y}$, prediction
    - $\hat{y}_{s}$, source that predicted $\hat{y}$
        - Source can be person, organization, and any type of entity.
    - $\hat{y}_{t}$, time when $\hat{y}$ was made
        - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
    - $\hat{y}_{f}$, forecast time when $\hat{y}$ is expected to come to fruition
        - Forecast can be from seconds to decades in the future.
        - How far to go out? Or where to stop?
    - $\hat{y}_a$, prediction attribute
        - Financial based attributes such as stock price, net profit, revenue
    - $\hat{y}_m$, prediction metric outcome
        - How much will the  $\hat{y}_a$ rise/increase or fall/decrease
    - $\hat{y}_v$, future verb tense
        - A verb that is associated with the future such as will, would, be going to, should, etc.

- Template: On [ $\hat{y}_{t}$,], [$\hat{y}_{s}$ person name] predicts that the [ $\hat{y}_a$] at [ $\hat{y}_s$ company name] [ $\hat{y}_v$] [ $\hat{y}_m$] by [$\hat{y}_m$, ] in [ $\hat{y}_{f}$]

Suppose you are the Chief Financial Officer at a publicly traded company on the US Stock Exchange. Please generate nine company-based financial prediction that will occur in the future following the requirements below:

1. Should be based on real-world earnings reports
2. Only a simple sentence (prediction) (and NOT compounding using "and" or "or")
3. Should be either positive, negative, or neutral for metric outcome
4. Suppose the time when $\hat{y}$ was made is during any earning season
5. Include attributes ($\hat{y}_a$) like stock price, net profit, revenue, etc
6. Include at least 5 stocks from all industries such as technology, energy, etc
7. Should diversity the metric outcome
8. Should use any future tense word such as will, may, should, could, etc and phrases such as high chance/probability/degree of...
9. Should have a forecast time when $\hat{y}$ is expected to come to fruition ($\hat{y}_{f}$) between 2025 to 2030
10. Diversity the name ($\hat{y}_{s}$)
11. Should use synonyms of predicts such as forecasts, speculates, forsee, envision, etc
12. Only include the predictions without "Here are 10 company-based financial prediction..." or anything similar and without the numbers in front

- Examples:
    1. On [Monday, December 16, 2024], [Detravious] forecasts that the [revenue] at [Apple] [will] [rise] by [8% to $120 per share] in [Q1 of 2025].
    2. On [Tuesday, November 19, 2024], [Ava Lee] predicts that the [operating cash flow] at [ExxonMobil (XOM)] [should] [decrease] by [5% to $20 billion] in [Q2 of 2027].
    3. On [Wednesday, October 23, 2024], [Julian Hall] envisions that the [stock price] at [NVIDIA (NVDA)] [will likely] [rise] by [25% to $1,000 per share] in [Q3 of 2028].
    4. On [Thursday, September 19, 2024], [Mia Patel] speculates that the [dividend payout ratio] at [Coca-Cola (KO)] [will probably] [remain] at [75%] in [Q1 of 2026].
    5. On [Friday, August 16, 2024], [Logan White] predicts that the [research and development expenses] at [Pfizer (PFE)] [may] [increase] by [8% to $10 billion] in [FY 2029].
    6. On [Monday, July 22, 2024], [Hannah Brooks] forecasts that the [return on equity (ROE)] at [JPMorgan Chase (JPM)] [has a high probability of] [improving] by [2% to 15%] in [Q4 of 2027].
    7. On [Tuesday, June 18, 2024], [Detravious Martin] predicts that the [capital expenditures] at [UnitedHealth Group (UNH)] [should] [decrease] by [3% to $2 billion] in [Q2 of 2028].
    8. On [Wednesday, May 15, 2024], [Raj Taylor] envisions that the [gross profit margin] at [McDonald's (MCD)] [will likely] [expand] by [1% to 18%] in [Q3 of 2026].
    9. On [Thursday, April 18, 2024], [Jackson Lee] forsees that the [total debt] at [Intel (INTC)] [will probably] [decrease] by [10% to $20 billion] in [Q1 of 2029].
    10. On [Friday, March 15, 2024], [Ethan Patel] predicts that the [earnings before interest and taxes (EBIT)] at [Verizon Communications (VZ)] [may] [increase] by [5% to $20 billion] in [FY 2028].
"""

prediction_label = 1

In [3]:
pd.set_option('max_colwidth', 800)

base_pipeline = BasePipeline()

predictions_df = base_pipeline.generate_predictions(text=prediction_template, label=prediction_label)
predictions_df

Unnamed: 0,Base Predictions,Prediction Label
0,"On Monday, December 16, 2024, Detravious forecasts that the revenue at Apple will rise by 8% to $120 per share in Q1 of 2025.",1
1,"On Tuesday, November 19, 2024, Ava Lee predicts that the operating cash flow at ExxonMobil (XOM) should decrease by 5% to $20 billion in Q2 of 2027.",1
2,"On Wednesday, October 23, 2024, Julian Hall envisions that the stock price at NVIDIA (NVDA) will likely rise by 25% to $1,000 per share in Q3 of 2028.",1
3,"On Thursday, September 19, 2024, Mia Patel speculates that the dividend payout ratio at Coca-Cola (KO) will probably remain at 75% in Q1 of 2026.",1
4,"On Friday, August 16, 2024, Logan White predicts that the research and development expenses at Pfizer (PFE) may increase by 8% to $10 billion in FY 2029.",1
5,"On Monday, July 22, 2024, Hannah Brooks forecasts that the return on equity (ROE) at JPMorgan Chase (JPM) has a high probability of improving by 2% to 15% in Q4 of 2027.",1
6,"On Tuesday, June 18, 2024, Detravious Martin predicts that the capital expenditures at UnitedHealth Group (UNH) should decrease by 3% to $2 billion in Q2 of 2028.",1
7,"On Wednesday, May 15, 2024, Raj Taylor envisions that the gross profit margin at McDonald's (MCD) will likely expand by 1% to 18% in Q3 of 2026.",1
8,"On Thursday, April 18, 2024, Jackson Lee forsees that the total debt at Intel (INTC) will probably decrease by 10% to $20 billion in Q1 of 2029.",1


In [4]:
non_prediction_template = """Generate any sentence that's not a prediction. A prediction is below with variables
    - $\hat{y}$, prediction
        - $\hat{y}_{s}$, source that predicted $\hat{y}$
            - Source can be person, organization, and any type of entity.
        - $\hat{y}_{t}$, time when $\hat{y}$ was made
            - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
        - $\hat{y}_{f}$, forecast time when $\hat{y}$ is expected to come to fruition
            - Forecast can be from seconds to decades in the future.
            - How far to go out? Or where to stop?
        - $\hat{y}_a$, prediction attribute
            - Financial based attributes such as stock price, net profit, revenue
        - $\hat{y}_m$, prediction metric outcome
            - How much will the  $\hat{y}_a$ rise/increase or fall/decrease
        - $\hat{y}_v$, future verb tense
            - A verb that is associated with the future such as will, would, be going to, should, etc.

    Please generate nine sentences with the following requirements below:

    1. Only a simple sentence (prediction) (and NOT compounding using "and" or "or")
    2. Include no additional information such as "Here are nine simple sentences that are not predictions:"
"""

non_prediction_label = 0

In [5]:
non_predictions_df = base_pipeline.generate_predictions(text=non_prediction_template, label=non_prediction_label)
non_predictions_df

Unnamed: 0,Base Predictions,Prediction Label
0,The sun is shining brightly in the clear sky.,0
1,The book fell off the table.,0
2,The color blue is often associated with calmness.,0
3,The city has a large population.,0
4,The teacher wrote on the blackboard.,0
5,The students are working on their project.,0
6,The company has a strong reputation.,0
7,The music is playing loudly in the room.,0
8,The dog is running quickly around the corner.,0


In [6]:
%store predictions_df
%store non_predictions_df

Stored 'predictions_df' (DataFrame)
Stored 'non_predictions_df' (DataFrame)


# References

1. PAPER: [On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey](http://arxiv.org/abs/2406.15126)
    - Using this to properly format prompt templates.