# 1-Generate Predictions

- **Goal:** Prediction Recognition

- **Purpose:** To implement step 1 with sub steps of prediction recognition pipeline. See steps
    1. Generate predictions
        1. Create several prediction prompts templates
        2. Utilize open-source LLMs to generate predictions

- **Misc:**
    - `%store`: Cell magic will store the variable of interest so we can load in another notebook

In [1]:
import os
import sys

import pandas as pd

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from pipelines import BasePipeline

In [7]:
prediction_template = """My variables are
- $\hat{y}$, prediction
    - $\hat{y}_{s}$, source that predicted $\hat{y}$
        - Source can be person, organization, and any type of entity.
    - $\hat{y}_{t}$, time when $\hat{y}$ was made
        - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
    - $\hat{y}_{f}$, forecast time when $\hat{y}$ is expected to come to fruition
        - Forecast can be from seconds to decades in the future.
        - How far to go out? Or where to stop?
    - $\hat{y}_{a}$, prediction attribute
        - Financial based attributes such as stock price, net profit, revenue, etc.
        - Weather-based attributes such as temperature, precipitation, wind speed, humidity, etc.
    - $\hat{y}_{m}$, prediction metric outcome
        - How much will the $\hat{y}_{a}$ rise/increase, fall/decrease, change
    - $\hat{y}_{v}$, future verb tense
        - A verb that is associated with the future such as will, would, be going to, should, etc.

- Requirements to use for each prediction
    1. Should be based on real-world earnings reports
    2. Only a simple sentence (prediction) (and NOT compounding using "and" or "or")
    3. Should be either positive, negative, or neutral for metric outcome
    4. Suppose the time when $\hat{y}$ was made is during any earning season
    5. Include attributes ($\hat{y}_{a}$) like stock price, net profit, revenue, temperature, precipitation, wind speed, etc.
    6. Include at least 5 stocks from all industries such as technology, energy, etc
    7. Should diversity the metric outcome (e.g., temperature rising 5 degrees, rainfall increasing by 20%).
    8. Should use any future tense word such as will, may, should, could, etc and phrases such as high chance/probability/degree of...
    9. Should have a forecast time when $\hat{y}$ is expected to come to fruition ($\hat{y}_{f}$) between 2025 to 2030
    10. Diversity the name ($\hat{y}_{s}$)
    11. Should use synonyms of predicts such as forecasts, speculates, forsee, envision, etc
    12. Only include the predictions without "Here are 10 company-based financial prediction..." or anything similar and without the numbers in front
    13. Use the five different templates above
    14. The prediction should be unique and not repeated
    15. The $\hat{y}_{f}$ should always be after $\hat{y}_{t}$
    16. Do not number the preditions
    17. Do not say, "As the Chief Financial Officer at a publicly traded company on the US Stock Exchange, I will generate five company-based financial predictions using the provided templates." or "As the Chief Meteorologist at a national weather forecasting agency, I will generate five weather-based predictions using the provided templates." or anything similar

- Financial Template 1: On [ $\hat{y}_{t}$,], [$\hat{y}_{s}$ person name] predicts that the [ $\hat{y}_{a}$] at [ $\hat{y}_{s}$ company name] [ $\hat{y}_{v}$] [ $\hat{y}_{m}$] by [$\hat{y}_{m}$, ] in [ $\hat{y}_{f}$]
- Template 2: On [ $\hat{y}_{t}$ ], [ $\hat{y}_{s}$ person name ] from [ $\hat{y}_{s}$ company name ] predicts that the [ $\hat{y}_{a}$ ] [ $\hat{y}_{v}$ ] by [ $\hat{y}_{m}$ ] in [ $\hat{y}_{f}$ ].
- Template 3: [ $\hat{y}_{s}$ person name ] predicts on [ $\hat{y}_{t}$ ] that the [ $\hat{y}_{a}$ ] at [ $\hat{y}_{s}$ company name ] [ $\hat{y}_{v}$ ] by [ $\hat{y}_{m}$ ] in [ $\hat{y}_{f}$ ].
- Template 4: According to [ $\hat{y}_{s}$ person name ] from [ $\hat{y}_{s}$ company name ], on [ $\hat{y}_{t}$ ], the [ $\hat{y}_{a}$ ] [ $\hat{y}_{v}$ ] by [ $\hat{y}_{m}$ ] in the timeframe of [ $\hat{y}_{f}$ ].
- Template 5: In [ $\hat{y}_{f}$ ], the [ $\hat{y}_a$ ] at [ $\hat{y}_s$ company name ] is expected to [ $\hat{y}_v$ ] by [ $\hat{y}_m$ ], as predicted by [ $\hat{y}s$ person name ] on [ $\hat{y}{t}$ ].

Suppose you are the Chief Financial Officer at a publicly traded company on the US Stock Exchange. Using the above templates, please generate one of each (so five total) company-based financial prediction that will occur in the future following the requirements above.

- Examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious] forecasts that the [revenue] at [Apple] [will] [rise] from [8% to $120 per share] in [Q1 of 2025].
    2. On [Tuesday, November 19, 2024], [Ava Lee] predicts that the [operating cash flow] at [ExxonMobil (XOM)] [should] [decrease] by [5% to $20 billion] in [Q2 of 2027].
- Examples for template 2:
    3. On [Wednesday, October 23, 2024], [Julian Hall] at [NVIDIA (NVDA)] envisions that the [stock price] [will likely] [rise] from [25% to $1,000 per share] in [Q3 of 2028].
    4. On [Thursday, September 19, 2024], [Raj Taylor] from [McDonald's (MCD)] predicts that the [net profit] [will] [fall] by [5% to $5 billion] in [Q4 of 2026].
- Examples for template 3:
    6. [Ava Lee] predicts on [Wednesday, October 23, 2024], that the [research and development expenses] at [Alphabet (GOOGL)] [will] [rise] by [10%] to [$20 billion] in [Q4 of 2027].
    7. [Michael Johnson] predicts on [Monday, March 18, 2024], that the [operating income] at [Microsoft (MSFT)] [will] [fall] from [15% to $50 billion] in [Q2 of 2026].
- Examples for template 4:
    8. According to [Ava Morales] from [Chevron (CVX)], on [Wednesday, August 21, 2024], the [net profit at Coca-Cola (KO)] [is expected to] [increase] from [5% to $10 billion] in the timeframe of [Q3 of 2029].
    9. According to [Sophia Martinez] from [Tesla (TSLA)], on [Friday, July 12, 2024], the [gross profit] [is expected to] [increase] by [15% to $30 billion] in the timeframe of [Q1 of 2026].
- Examples for template 5:
    10. In [Q1 of 2026], the [net profit] at [Amazon (AMZN)] is expected to [increase] from [10% to $15 billion], as predicted by [Emily Davis] on [Monday, January 15, 2024].
    11. In [Q3 of 2027], the [revenue] at [Facebook (META)] is expected to [rise] by [20% to $50 billion], as predicted by [John Smith] on [Tuesday, February 20, 2024].

Suppose you are the Chief Meteorologist at a national weather forecasting agency. Using the above templates, please generate one of each (so five total) weather-based predictions that will occur in the future following the requirements above.

- Weather Template 1: On [ $\hat{y}_{t}$,], [$\hat{y}_{s}$ meteorologist name] predicts that the [ $\hat{y}_{a}$ ] [ $\hat{y}_{v}$ ] [ $\hat{y}_{m}$ ] in [ $\hat{y}_{f}$ ].
- Weather Template 2: On [ $\hat{y}_{t}$ ], [ $\hat{y}_{s}$ meteorologist name ] from [ $\hat{y}_{s}$ weather organization ] forecasts that the [ $\hat{y}_{a}$ ] [ $\hat{y}_{v}$ ] by [ $\hat{y}_{m}$ ] in [ $\hat{y}_{f}$ ].
- Weather Template 3: [ $\hat{y}_{s}$ meteorologist name ] predicts on [ $\hat{y}_{t}$ ] that the [ $\hat{y}_{a}$ ] [ $\hat{y}_{v}$ ] [ $\hat{y}_{m}$ ] in [ $\hat{y}_{f}$ ].
- Weather Template 4: According to [ $\hat{y}_{s}$ meteorologist name ] from [ $\hat{y}_{s}$ weather organization ], on [ $\hat{y}_{t}$ ], the [ $\hat{y}_{a}$ ] [ $\hat{y}_{v}$ ] by [ $\hat{y}_{m}$ ] in the timeframe of [ $\hat{y}_{f}$ ].
- Weather Template 5: In [ $\hat{y}_{f}$ ], the [ $\hat{y}_{a}$ ] is expected to [ $\hat{y}_{v}$ ] by [ $\hat{y}_{m}$ ], as predicted by [ $\hat{y}_{s}$ meteorologist name ] on [ $\hat{y}_{t}$ ].

- Examples for template 1:
    1. On [Tuesday, February 13, 2025], [Dr. Melissa Carter] predicts that the [temperature] [will] [rise] by [5°C] in [New York City] by [Friday, February 16, 2025].
    2. On [Monday, April 8, 2025], [Ethan James] forecasts that [precipitation levels] [are likely to] [increase] by [20%] in [San Francisco] in [May 2025].
- Examples for template 2:
    3. On [Wednesday, March 20, 2025], [Samantha Lin] from [NOAA] forecasts that the [wind speed] [should] [decrease] by [15 mph] in [Chicago] by [Friday, March 22, 2025].
    4. On [Saturday, June 15, 2025], [Carlos Rivera] from [Weather.com] predicts that the [humidity] [will] [rise] by [30%] in [Miami] in [July 2025].
- Examples for template 3:
    5. [Amanda Green] predicts on [Sunday, January 19, 2025] that the [temperature] in [Seattle] [will] [fall] by [10°F] in [late January 2025].
    6. [Tommy Wu] predicts on [Friday, November 22, 2024], that [snowfall levels] in [Denver] [will likely] [increase] by [8 inches] in [December 2024].
- Examples for template 4:
    7. According to [Sophia Lewis] from [AccuWeather], on [Monday, May 6, 2024], the [rainfall] in [Portland] [is expected to] [decrease] by [10%] in the timeframe of [early June 2024].
    8. According to [David Harper] from [Weather Underground], on [Friday, August 9, 2024], the [air quality index] in [Los Angeles] [is likely to] [improve] by [20%] in the timeframe of [fall 2024].
- Examples for template 5:
    9. In [April 2025], the [average temperature] in [Houston] is expected to [rise] by [5°F], as predicted by [Emily Cooper] on [Monday, February 18, 2025].
    10. In [January 2025], the [wind chill] in [Minneapolis] is expected to [fall] by [10°F], as predicted by [James Ortiz] on [Tuesday, December 3, 2024].

"""

prediction_label = 1

In [8]:
pd.set_option('max_colwidth', 800)

base_pipeline = BasePipeline()

predictions_df = base_pipeline.generate_predictions(text=prediction_template, label=prediction_label)
predictions_df

Unnamed: 0,Base Predictions,Prediction Label
0,"On Friday, November 17, 2024, Olivia Brown predicts that the revenue at Johnson & Johnson will rise by 12% to $25 billion in Q2 of 2026.",1
1,"On Thursday, October 25, 2024, Ethan Kim from Cisco Systems forecasts that the net profit will increase by 8% to $10 billion in Q4 of 2028.",1
2,"Ava Lee predicts on Wednesday, September 20, 2024, that the operating cash flow at 3M will fall by 5% to $5 billion in Q1 of 2027.",1
3,"According to Liam Chen from Intel, on Tuesday, August 15, 2024, the gross profit is expected to rise by 10% to $20 billion in the timeframe of Q3 of 2029.",1
4,"In Q2 of 2027, the stock price at Visa is expected to rise by 15% to $200 per share, as predicted by Julianne Lee on Monday, July 10, 2024.",1
5,"On Monday, March 11, 2025, Dr. Ryan Mitchell predicts that the temperature will rise by 3°C in Dallas by Wednesday, March 14, 2025.",1
6,"On Friday, February 22, 2025, Maya Patel from the National Weather Service forecasts that the precipitation levels should decrease by 15% in Phoenix by Sunday, February 24, 2025.",1
7,"Emily Wong predicts on Thursday, January 24, 2025, that the wind speed in Boston will fall by 8 mph in late January 2025.",1
8,"According to Jackson Hall from the Weather Channel, on Wednesday, November 14, 2024, the humidity is expected to rise by 25% in Atlanta in the timeframe of early December 2024.",1
9,"In June 2026, the average precipitation is expected to increase by 12% in Seattle, as predicted by Benjamin Brooks on Tuesday, April 16, 2025.",1


In [4]:
non_prediction_template = """Generate any sentence that's not a prediction. A prediction is below with variables
    - $\hat{y}$, prediction
        - $\hat{y}_{s}$, source that predicted $\hat{y}$
            - Source can be person, organization, and any type of entity.
        - $\hat{y}_{t}$, time when $\hat{y}$ was made
            - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
        - $\hat{y}_{f}$, forecast time when $\hat{y}$ is expected to come to fruition
            - Forecast can be from seconds to decades in the future.
            - How far to go out? Or where to stop?
        - $\hat{y}_a$, prediction attribute
            - Financial based attributes such as stock price, net profit, revenue
        - $\hat{y}_m$, prediction metric outcome
            - How much will the  $\hat{y}_a$ rise/increase or fall/decrease
        - $\hat{y}_v$, future verb tense
            - A verb that is associated with the future such as will, would, be going to, should, etc.

    Please generate nine sentences with the following requirements below:

    1. Only a simple sentence (prediction) (and NOT compounding using "and" or "or")
    2. Include no additional information such as "Here are nine simple sentences that are not predictions:"
"""

non_prediction_label = 0

In [None]:
non_predictions_df = base_pipeline.generate_predictions(text=non_prediction_template, label=non_prediction_label)
non_predictions_df

In [None]:
%store predictions_df
%store non_predictions_df

# References

1. PAPER: [On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey](http://arxiv.org/abs/2406.15126)
    - Using this to properly format prompt templates.