# 1-Generate Predictions

- **Goal:** Prediction Recognition

- **Purpose:** To implement step 1 with sub steps of prediction recognition pipeline. See steps
    1. Generate predictions
        1. Create several prediction prompts templates
        2. Utilize open-source LLMs to generate predictions

- **Misc:**
    - `%store`: Cell magic will store the variable of interest so we can load in another notebook

In [1]:
import os
import sys

import pandas as pd

from pathlib import Path

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from pipelines import BasePipeline

In [2]:
file_path = Path('../prediction_template.txt')
prediction_template = file_path.read_text()
print(prediction_template)
prediction_label = 1

"""My variables are
- $\hat{y}$, prediction
    - $\hat{y}_{s}$, source that predicted $\hat{y}$
        - Source can be person, organization, and any type of entity.
        - Analyst forecasts, financial reports, company executives, etc.
        - Meteorologists, weather organizations, or any type of weather-predicting entity.
        - Health organization, researcher, doctor, physical therapist, physician assistant, nurse practictioners, fitness expert, etc.
    - $\hat{y}_{t}$, time when $\hat{y}$ was made
        - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
    - $\hat{y}_{f}$, forecast time when $\hat{y}$ is expected to come to fruition
        - Forecast can be from seconds to decades in the future.
        - How far to go out? Or where to stop?
    - $\hat{y}_{a}$, prediction attribute
        - Financial based attributes such as stock price, net profit, revenue, etc.
        - Weather-based attributes such as temperature, precipitation, 

In [3]:
pd.set_option('max_colwidth', 800)

base_pipeline = BasePipeline()

predictions_df = base_pipeline.generate_predictions(text=prediction_template, label=prediction_label)
predictions_df

Unnamed: 0,Base Predictions,Prediction Label
0,I'll provide the predictions for each category.,1
1,**Company-Based Financial Predictions**,1
2,"On Wednesday, October 25, 2024, Emily Chen predicts that the revenue at Amazon (AMZN) will rise by 12% to $150 billion in Q2 of 2026.",1
3,"On Thursday, September 12, 2024, David Lee from NVIDIA (NVDA) forecasts that the stock price will likely increase by 20% to $1,200 per share in Q4 of 2027.",1
4,"Ava Morales predicts on Friday, August 16, 2024, that the operating cash flow at ExxonMobil (XOM) will decrease by 8% to $15 billion in Q1 of 2028.",1
5,"According to Julian Hall from Microsoft (MSFT), on Tuesday, July 23, 2024, the net profit will increase by 10% to $25 billion in the timeframe of Q3 of 2029.",1
6,"In Q2 of 2026, the gross profit at Apple (AAPL) is expected to rise by 15% to $30 billion, as predicted by Sophia Patel on Monday, June 10, 2024.",1
7,**Weather-Based Predictions**,1
8,"On Monday, March 17, 2025, Dr. Ethan Kim predicts that the temperature will rise by 3°C in New York City by Friday, March 21, 2025.",1
9,"On Tuesday, April 15, 2025, Samantha Taylor from NOAA forecasts that the precipitation levels will increase by 15% in San Francisco in May 2025.",1


In [4]:
file_path = Path('../non_prediction_template.txt')
non_prediction_template = file_path.read_text()
print(non_prediction_template)
non_prediction_label = 0

"""Generate any sentence that's not a prediction, which we name non-prediction. A prediction is below with variables
    - $\hat{y}$, prediction
        - $\hat{y}_{s}$, source that predicted $\hat{y}$
            - Source can be person, organization, and any type of entity.
        - $\hat{y}_{t}$, time when $\hat{y}$ was made
            - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
        - $\hat{y}_{f}$, forecast time when $\hat{y}$ is expected to come to fruition
            - Forecast can be from seconds to decades in the future.
            - How far to go out? Or where to stop?
        - $\hat{y}_a$, prediction attribute
            - Financial based attributes such as stock price, net profit, revenue
        - $\hat{y}_m$, prediction metric outcome
            - How much will the  $\hat{y}_a$ rise/increase or fall/decrease
        - $\hat{y}_v$, future verb tense
            - A verb that is associated with the future such as will, would

In [5]:
non_predictions_df = base_pipeline.generate_predictions(text=non_prediction_template, label=non_prediction_label)
non_predictions_df

Unnamed: 0,Base Predictions,Prediction Label
0,The company is currently hiring new employees for various positions available now.,0
1,The weather today is mostly sunny with some clouds in the sky.,0
2,The new policy has been implemented to improve customer service quality.,0
3,The meeting will be rescheduled due to unforeseen circumstances that occurred.,0
4,The current stock price is lower than expected by investors now.,0
5,The team is working diligently to resolve the ongoing technical issues.,0
6,The product is available for purchase online and in stores now.,0
7,The company's mission statement is to provide excellent customer service always.,0
8,The employees are required to attend a mandatory training session tomorrow.,0
9,The new restaurant is open for lunch and dinner daily now.,0


In [6]:
%store predictions_df
%store non_predictions_df

Stored 'predictions_df' (DataFrame)
Stored 'non_predictions_df' (DataFrame)


# References

1. PAPER: [On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey](http://arxiv.org/abs/2406.15126)
    - Using this to properly format prompt templates.