# 1-Generate Predictions using LangChain

- **Goal:** Prediction Recognition

- **Purpose:** To implement step 1 with sub steps of prediction recognition pipeline. See steps
    1. Generate predictions
        1. Create several prediction prompts templates
        2. Utilize open-source LLMs to generate predictions

- **Misc:**
    - `%store`: Cell magic will store the variable of interest so we can load in another notebook

In [1]:
# !pip3 install pandas langchain spacy numpy Groq

In [2]:
# !pip3 install -U scikit-learn

In [3]:
# !pip3 install python-dotenv

In [38]:
import os, sys

import pandas as pd

from tqdm import tqdm
from langchain_core.prompts import PipelinePromptTemplate, PromptTemplate

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from log_files import DataFrameLogger
from data_processing import DataProcessing
from text_generation_models import TextGenerationModelFactory, LlamaVersatileTextGenerationModel, LlamaInstantTextGenerationModel, Llama70B8192TextGenerationModel, Llama8B8192TextGenerationModel, MixtralTextGenerationModel

In [5]:
# pd.set_option('max_colwidth', 800)

llama_versatile_generation_model = LlamaVersatileTextGenerationModel()
llama_instant_generation_model = LlamaInstantTextGenerationModel()
llama_70b_8192_generation_model = Llama70B8192TextGenerationModel()
llama_8b_8192_generation_model = Llama8B8192TextGenerationModel()
mixtral_generation_model = MixtralTextGenerationModel()

## LangChain Templates for Domain Predictions

In [6]:
full_prediction_template = """{prediction_properties}

{prediction_requirements}

{prediction_templates}

{prediction_examples}"""
full_prediction_prompt = PromptTemplate.from_template(full_prediction_template)

Google predictive spelling/autocomplete 

In [7]:
prediction_properties_template = """A prediction ($p$) consists of the following nine properties:

    1. $p_p$, {prediction_domain} person that predicted $p$
        - Can be a person (with a name) or a {prediction_domain} person such as a {prediction_domain} reporter, {prediction_domain} analyst, {prediction_domain} expert, {prediction_domain} top executive, {prediction_domain} senior level person, etc).
    2. $p_o$, {prediction_domain} organization 
        - Can only be an organization or entity that is associated with the {prediction_domain} prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, {prediction_domain} prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the {prediction_domain} domain.
        - Some examples are {prediction_domain_attribute}.
    6. $p_s$, slope that indicates the direction of change in $p_a$
        - Change of directions can be rise/increase/as much as, fall/decrease/as little as, change, stay stable, high/low chance/probability/degree of, etc.
    7. $p_m$, metric outcome
        - How much will the $p_a$ $p_s$?
    8. $p_v$, future verb tense
        - A verb that is associated with the future such as will, would, be going to, should, etc.
    9. $p_l$, location
        - The location is attached to attribute $p_a$ if {prediction_domain} == 'weather'
    """
prediction_properties_prompt = PromptTemplate.from_template(prediction_properties_template)

    - Keep the brackets around the prediction properties when generating predictions and be sure to include brackets around dates such as "2024-10-15", "2024/08/20", "Q4 of 2024", "2025", "2027 Q1", "Q3 2027", "On 21 Aug 2024".

In [None]:
prediction_requirements_template = """{prediction_domain} requirements to use for each prediction:

    - Should be based on real-world {prediction_domain} data and not hallucinate.
    - Only a simple sentence (prediction) (and NOT compounding using "and" or "or").
    - Should diversify all nine properties of the prediction ($p$) meaning to change and not use same (p_p, p_o, p_t, p_f, p_a, p_s, p_m, p_v, p_l) .
    - Should use synonyms of $p_w$ such as forecasts, speculates, foresee, envision, etc., and not use any of them more than ten times.
    - The prediction should be unique and not repeated.
    - The forecast time ($p_f$) should always be after current time ($p_t$) of when forecast ($p$) was made.
    - Do not number the predictions.
    - Do not say, "As the {prediction_domain} at organization ($p_o$), I will generate company-based {prediction_domain} predictions using the provided templates." or anything similar.
    - Should have a forecast time ($p_f$) when $p$ is expected to come to fruition between 2025 to 2050.
    - Use the five different templates and examples provided.
    - Change how the current time ($p_t$) and forecast time ($p_f$) are written in the prediction with examples of (1) Wednesday, August 21, 2024; (2) Wed, August 21, 2024; (3) 08/21/2024; (4) 08/21/2024; (5) 21/08/2024; (6) 21 August 2024; (7) 2024/08/21; (8) 2024-08-21; (9) August 21, 2024; (10) Aug 21, 2024; (11) 21 August 2024, (12) 21 Aug 2024, Q3 of 2027, 2029 of Q3, etc (with removing day of week).
    {domain_requirements}
    - Stop saying, "Here are {predictions_N} unique {prediction_domain} predictions based on the provided templates and examples:" in the prompt.
    - Do not use any of the examples in the prompt.
    - In front of every prodiction, put the template number in the format of "T1:", "T2:", etc. and do not number them like "1.", "2.", etc.
    - Disregard brackets: "[]"
    - Should never say "Here are {predictions_N} unique {prediction_domain} predictions based on the provided templates and examples:" 
    - Be sure to space words when generating the prediction metric ($p_m$) like "from _ to _"""
prediction_requirements_prompt = PromptTemplate.from_template(prediction_requirements_template)

In [None]:
prediction_templates_template = """Here are some {prediction_domain} templates:

- {prediction_domain} template 1: On [ $p_t$ ], [ $p_p$ ] [ $p_w$ ] that the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] [ $p_s$ ] by [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 2: In [ $p_t$ ], [ $p_p$ ] from [ $p_o$ ], [ $p_w$ ] that the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] from [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 3: [ $p_p $] [ $p_w$ ] on [ $p_t$ ] that the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] [ $p_s$ ] under [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 4: According to a [ $p_w$ ] from a [ $p_p$ ] from [ $p_o$ ], on [ $p_t$ ], the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] beyond [ $p_m$ ] in the time frame of [ $p_f$ ].
- {prediction_domain} template 5: In [ $p_f$ ], the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] a [ $p_m$ ] [ $p_s$ ], as [ $p_w$ ] by [ $p_p$ ] on [ $p_t$ ]."""
prediction_templates_prompt = PromptTemplate.from_template(prediction_templates_template)

In [10]:
prediction_examples_template = """Here are some examples of {prediction_domain} predictions:

{domain_examples}

With the above, generate a unique set of {predictions_N} predictions. Think from the perspective of an {prediction_domain} analyst, expert, top executive, or senior level person."""
prediction_examples_prompt = PromptTemplate.from_template(prediction_examples_template)

In [11]:
prediction_input_prompts = [
    ("prediction_properties", prediction_properties_prompt),
    ("prediction_requirements", prediction_requirements_prompt),
    ("prediction_templates", prediction_templates_prompt),
    ("prediction_examples", prediction_examples_prompt),
]

pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prediction_prompt, pipeline_prompts=prediction_input_prompts
)

  pipeline_prompt = PipelinePromptTemplate(


## Generate Domain Predictions

In [12]:
predictions_N = 10

### Generate Financial Predictions

In [None]:
financial_attributes = """stock price, net profit, revenue, operating cash flow, research and development expenses, operating income, gross profit."""
financial_requirements = """- Should be based on real-world financial earnings reports.
    - Suppose the time when $p$ was made is during any earning season.
    - Include stocks from all sectors such as consumer staples, energy, finance, health care, industrials, materials, media, real estate, retail, technology, utilities, defense, etc.
    - Include the US Dollar sign ($) before or USD after the amount of the financial attribute."""

# For each template, have a rise, fall, or stable example, respectively.
financial_examples = """
- financial examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious, an investor] [forecasts] that the [revenue] at [Apple] [will likely] [increase] from [$70 to $97 billion] in [2025 Q1].
    2. On [Tue, November 19, 2024], [Ava Lee] [predicts] that the [operating cash flow] at [ExxonMobil] [should] [decrease] by [5 percent to $20 billion] in [08/21/2025].
- financial examples for template 2:
    3. In [October 2024], [Julian Hall] from [Yahoo Finance], [envisions] that the [stock price] [will] [rise] from [$800 to $1,000 per share] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Mrs. Kalia] from [McDonald's], [speculates] that the [net profit] [will] [fall] under [5% to $5 billion] in [January of 2029].
- financial examples for template 3:
    5. [Dija Gabe, a financial expert] [speculates] on [23 October 2024] that the [research and development expenses] at [Alphabet] [may] [stay stable] at [$20 million] in [2027 Quarter 4].
    6. [Mr. Mike] [forecasts] in [Q2 2026] that the [operating income] at [Microsoft] [will] [fall] by [407 percent to $50M] on [Monday, Nov 18, 2026].
- financial examples for template 4:
    7. According to a [prediction] from a [top executive] from [Chevron], on [08/21/2024], the [net profit] [is expected to] [increase] beyond [10,000 USD] in the time frame of [Q3 of 2029].
    8. According to a [envisions] from a [Brittany] from [Tesla], on [Fri, July 12, 2024], the [gross profit] [may] [decrease] as much as [$30,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- financial examples for template 5:
    9. In [2025-08-21], the [net profit] at [Amazon] has a [probability] of [11 percent to reach $30k] [decrease], as [speculated] by [Emily Davis, a financial reporter] on [21 Oct 24].
    10. In [Quarter of 2027], the [revenue] at [Facebook] [is expected] to be [$30 billion, which is a 15%] [rise], as [predicted] by [a financial analyst] on [Sun, February 20, 2024]."""

In [14]:
financial_input_dict = {
    "prediction_domain": "financial",
    "prediction_domain_attribute": financial_attributes,
    "domain_requirements": financial_requirements,
    "domain_examples": financial_examples,
    "predictions_N": predictions_N
}
financial_prompt_output = pipeline_prompt.format(**financial_input_dict)
print(financial_prompt_output)
# prompt_template = "Your prompt here"
# label = 1  # or "0" for non-prediction
# domain = "finance" 


# pd.set_option('max_colwidth', 800)

# llama_versatile_generation_model = LlamaVersatileTextGenerationModel()
# llama_instant_generation_model = LlamaInstantTextGenerationModel()
# llama_70b_8192_generation_model = Llama70B8192TextGenerationModel()
# llama_8b_8192_generation_model = Llama8B8192TextGenerationModel()
# mixtral_generation_model = MixtralTextGenerationModel()


# versatile_financial_df = llama_versatile_generation_model.generate_predictions(financial_prompt_output, label, domain)
# instant_financial_df = llama_instant_generation_model.generate_predictions(financial_prompt_output, label, domain)
# seventy_financial_df = llama_70b_8192_generation_model.generate_predictions(financial_prompt_output, label, domain)
# eight_financial_df = llama_8b_8192_generation_model.generate_predictions(financial_prompt_output, label, domain)
# mixtral_financial_df = mixtral_generation_model.generate_predictions(financial_prompt_output, label, domain)

# financial_df = [versatile_financial_df, instant_financial_df, seventy_financial_df, eight_financial_df, mixtral_financial_df]
# DataProcessing.concat_dfs(financial_df)

A prediction ($p$) consists of the following nine properties:

    1. $p_p$, financial person that predicted $p$
        - Can be a person (with a name) or a financial person such as a financial reporter, financial analyst, financial expert, financial top executive, financial senior level person, etc).
    2. $p_o$, financial organization 
        - Can only be an organization or entity that is associated with the financial prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, financial prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the financial domain.
        - Some examples are stock price, net profi

In [15]:
# gaurd_financial_df

### Generate Health Predictions

In [None]:
health_attributes = """obesity rates, prevalence of chronic illnesses, average physical activity levels, nutritional intake, etc."""
health_requirements = """- Should be based on real-world health reports.
    - Suppose the time when $p$ was made is during any season such as flu season, allergy season, pandemic, epidemic, etc.
    - Include reports from all Health organization, researcher, doctor, physical therapist, physician assistant, nurse practictioners, fitness expert, etc."""

# For each template, have a rise, fall, or stable example, respectively.?
health_examples = """
- health examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious, an investor] forecasts that the [obesity rate] at the [United States] [will likely] [decrease] by [5%] in [2025 Q1].
    2. On [Tue, November 19, 2024], [medical professional Sophia Rodriguez] predicts that the [cancer rate] in [Georgia] [should] [decrease] by [4 percent] in [08/21/2025].
- health examples for template 2:
    3. In [October 2024], [Arjun Patel, Ph.D] from [Florida Department of Health] envisions that the [average daily caloric intake] [may] [rise] from [100 to 300] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Dr. Michael Brown] from the [Centers for Disease Control and Prevention], foresee that the [average daily caloric intake] [will] [fall] [8 percent] in [2027]
- Examples for template 3: .
- health examples for template 3:
    5. [A trusted expert] [speculates] on [23 October 2024] that the [global vaccination rate for measles] in the [US] [should] [stay stable] at [100K people] in [2027 Quarter 4].
    6. [Dr. Sarah Johnson] foresee in [Q2 2026] that the [prevalence of hypertension] in [California] [will] [fall] by [407 percent] by [Monday, Nov 18, 2026].
- health examples for template 4:
    7. According to a [Olivia Martinez] from [Stanford University], on [08/21/2024], the prevalence of [type 2 diabetes in adults] [is expected to] [increase] beyond [8.5 percent] in the time frame of [Q3 of 2029].
    8. According to [Rachel Kim, MD] from the [University of California], on [Fri, July 12, 2024], the prevalence of [type 2 diabetes in adults] [may] [increase] as much as [30,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- health examples for template 5:
    9. In [2025-08-21], the [average weekly exercise hours] in [United States] has a [probability] of [20 percent to reach 30k], as predicted by [Emily Davis, Harvard School of Public Health] on [21 Oct 24].
    10. In [Quarter of 2027], the [average weekly walking hours] in [Atlanta] is [expected to rise] by [15%], as predicted by the [Monique, National Institutes of Health] on [Sun, February 20, 2024]."""

In [17]:
health_input_dict = {
    "prediction_domain": "health",
    "prediction_domain_attribute": health_attributes,
    "domain_requirements": health_requirements,
    "domain_examples": health_examples,
    "predictions_N": predictions_N
}

health_prompt_output = pipeline_prompt.format(**health_input_dict)
# print(health_prompt_output)

# domain = "health" 


# health_df = llama_generation_model.generate_predictions(health_prompt_output, label, domain)
# weather_df

### Generate Policy Predictions

In [18]:
policy_attributes = """election outcomes, economic reforms, legislative impacts."""
policy_requirements = """- Should be based on real-world policy reports.
    - Suppose the time when $p$ was made is during an election cycle or non-election cycles.
    - Include policies & laws, from all sectors such as consumer staples, energy, finance, health care, industrials, materials, media, real estate, retail, technology, utilities, defense, etc."""

policy_examples = """
- policy examples for template 1:
    1. On [Monday, December 16, 2024], [President John Doe] forecasts that the [unemployment rate] at [the United States] [will likely] [decrease] by [2%] in [2025 Q1].
    2. On [Tue, November 19, 2024], [Dr. Jane Smith] foresee that the [population growth rate] in [California] [is likely to] [decrease] by [5 percent to 20 billion] in [08/21/2025].
- policy examples for template 2:
    3. In [October 2024], [Senator Emily Johnson] from [the Senate Committee on Finance], envisions that the [inflation rate] [should] [rise] from [1.3 percent to 89 percen] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Governor Michael Brown] from [the State of Texas], predicts that the [number of registered voters] [will] [fall] under [5B] in [Dec of 2029].
- policy examples for template 3:
    5. [Dija Gabe in the Congressional Budget Office] predicts on [23 October 2024] that [national debt] in [USA] [may] [stay stable] at [20 million] in [2027 Quarter 4].
    6. [Dr. Sarah Lee] foresee in [Q2 2026] that the [median household income] in [NY] [should] [fall] by [629 percent to $15,000] on [Monday, Nov 18, 2026].
- policy examples for template 4:
    7. According to a [General Robert Williams] from [the Department of Defense], on [08/21/2024], the [number of active-duty soldiers] [is expected to] [increase] beyond [$10,000] in the time frame of [Q3 of 2029].
    8. According to [Dr. Olivia Martinez] from [the Census Bureau], on [Fri, July 12, 2024], the [population density] in [urban areas] [is likely to] [increase] as much as [100,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- policy examples for template 5:
    9. In [2025-08-21], the [number of citizens] in [Thomson, GA, 30824] has a [probability] of [92 percent to reach 30k] [decrease], as predicted by [Shirly Tisdale, a policy reporter] on [21 Oct 24].
    10. In [Quarter of 2027], the  [number of Navy members] in [the United States] [is expected] to be [300K, which is a 15%] [rise], as predicted by [a policy analyst] on [Sun, February 20, 2024]."""

In [19]:
policy_input_dict = {
    "prediction_domain": "policy",
    "prediction_domain_attribute": policy_attributes,
    "domain_requirements": policy_requirements,
    "domain_examples": policy_examples,
    "predictions_N": predictions_N
}

policy_prompt_output = pipeline_prompt.format(**policy_input_dict)
# print(policy_prompt_output)
# domain = "policy" 

# policy_df = llama_generation_model.generate_predictions(policy_prompt_output, label, domain)
# policy_df

### Generate Weather Predictions

In [20]:
weather_prediction_templates_template = """Here are some {prediction_domain} templates:

- {prediction_domain} template 1: On [ $p_t$ ], [ $p_p$ ] predicts that the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] [ $p_s$ ] by [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 2: In [ $p_t$ ], [ $p_p$ ] from [ $p_o$ ] in [ $p_l$ ], predicts that the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] from [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 3: [ $p_p $] predicts on [ $p_t$ ] that the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] [ $p_s$ ] under [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 4: According to a [ $p_p$ ] from [ $p_o$ ] in [ $p_l$ ], on [ $p_t$ ], the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] beyond [ $p_m$ ] in the time frame of [ $p_f$ ].
- {prediction_domain} template 5: In [ $p_f$ ], the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] a [ $p_m$ ] [ $p_s$ ], as predicted by [ $p_p$ ] on [ $p_t$ ]."""
weather_prediction_templates_prompt = PromptTemplate.from_template(weather_prediction_templates_template)

In [21]:
weather_attributes = """temperature, precipitation, wind speed, humidity, etc."""
weather_requirements = """- Should be based on real-world weather reports.
    - Suppose the time when $p$ was made is during any season and any location (ie: Florida known for hurricanes, California known for wildfires, etc).
    - Include reports from all meteorologists, weather organizations, or any type of weather entity.."""

# For each template, have a rise, fall, or stable example, respectively.?
weather_examples = """
- weather examples for template 1:
    1. On [Monday, December 16, 2024], [Dr. Melissa Carter] a weather expert at the [National Weather Service], forecasts that the [temperature], in [New York City] [will likely] [decrease] from [5°C to 3°C] on [February 16, 2025 (Fri)].
    2. On [Tue, 19 November 2024], [Ethan James] at the [US Weather Center] predicts that the [precipitation levels], in [San Francisco] [are likely to] [increase] by [20%] in the time frame of [08/21/2025].
- weather examples for template 2:
    3. In [October 2024], [Samantha Lin] from [NOAA], envisions that the [wind speed] [should] [decrease] by [15 mph] in [Chicago] by [Friday, March 22, 2025].
    4. In [8/15/2027], [Carlos Rivera] from [Weather.com] predicts that the [humidity] [will] [rise] by [30%] in [Miami] in [July of 2025].
- weather examples for template 3:
    5. [Amanda Green], a weather reporter from [Bureau of Meteorology]  predicts on [23 October 2024] that the [temperature] in [Seattle], [will] [fall] by [10°F] in [2025 Quarter 1].
    6. [Mr. Tommy Wu], from [US Weather Center] predicts in [Q2 2026] that [snowfall levels], in [Denver] [will likely] [increase] by [8 inches] in [Monday, Nov 18, 2026].
- weather examples for template 4:
    7. According to a [top executive] from [AccuWeather], on [12/21/2024], the [rainfall] in [Portland] [is expected to] [increase] beyond [10 percent] in the time frame of [early 2025].
    8. According to [David Harper] from [Weather Underground, on [Fri, August 9, 2024], the [air quality index] in [Los Angeles] [is likely to] [improve] by [20%] in [21 Aug 2024].
- weather examples for template 5:
    9. In [2025-08-21], the [average temperature] in [Houston] has a [probability] of [5 percent to] [decrease], as predicted by [King, a weather reporter] from [Meteorological Department] on [21 Oct 24].
    10. In [Quarter of 2027], wind chill] in [Minneapolis] [is expected] to be [10°F, which is a 15%] [rise], as predicted by [a weather analyst named Ortiz] on [Sun, February 20, 2024]."""

In [22]:
weather_input_prompts = [
    ("prediction_properties", prediction_properties_prompt),
    ("prediction_requirements", prediction_requirements_prompt),
    ("prediction_templates", weather_prediction_templates_prompt),
    ("prediction_examples", prediction_examples_prompt),
]

weather_pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prediction_prompt, pipeline_prompts=weather_input_prompts
)

weather_input_dict = {
    "prediction_domain": "weather",
    "prediction_domain_attribute": weather_attributes,
    "domain_requirements": weather_requirements,
    "domain_examples": weather_examples,
    "predictions_N": predictions_N
}
weather_prompt_output = weather_pipeline_prompt.format(**weather_input_dict)
# print(weather_prompt_output)


# domain = "weather" 

# weather_df = llama_generation_model.generate_predictions(weather_prompt_output, label, domain)
# weather_df

## LangChain Templates for Any Domain Non-Predictions

In [23]:
full_non_prediction_template = """{non_prediction_properties}

{non_prediction_requirements}

{non_prediction_examples}"""
full_non_prediction_prompt = PromptTemplate.from_template(full_non_prediction_template)

In [24]:
non_prediction_properties_template = """Generate any sentence that's not a prediction, which we name non-prediction. A prediction is below with variables
    1. $p_p$, {prediction_domain} person that predicted $p$
        - Can be a person (with a name) or a {prediction_domain} person such as a {prediction_domain} reporter, {prediction_domain} analyst, {prediction_domain} expert, {prediction_domain} top executive, {prediction_domain} senior level person, etc).
    2. $p_o$, {prediction_domain} organization 
        - Can only be an organization or entity that is associated with the {prediction_domain} prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, {prediction_domain} prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the {prediction_domain} domain.
    6. $p_s$, slope that indicates the direction of change in $p_a$
        - Change of directions can be rise/increase/as much as, fall/decrease/as little as, change, stay stable, high/low chance/probability/degree of, etc.
    7. $p_m$, metric outcome
        - How much will the $p_a$ $p_s$?
    8. $p_v$, future verb tense
        - A verb that is associated with the future such as will, would, be going to, should, etc.
    9. $p_l$, location
        - The location is attached to attribute $p_a$ if {prediction_domain} == 'weather'
    """
non_prediction_properties_prompt = PromptTemplate.from_template(non_prediction_properties_template)

In [25]:
non_prediction_requirements = """ requirements to use for each non-prediction:

    - Should be based on real-world {prediction_domain} data and not hallucinate.
    - Should be a simple sentence (non-prediction) (and NOT compounding using "and" or "or").
    - The prediction should be unique and not repeated.
    - Do not number the non-predictions.
    - Do not say, "Here are {non_predictions_N} unique non-predictions based on the provided templates and examples:" or anything similar in the prompt.
    - Do not use any of the examples in the prompt.
    - In front of every non-prodiction, put the template number in the format of "T0:" and only use "T0:" as the template number.
    - Should be between 10 (ten) to 40 (forty) words."""
non_prediction_requirements_prompt = PromptTemplate.from_template(non_prediction_requirements)

In [26]:
non_prediction_examples_template = """Here are some examples of {prediction_domain} non-predictions:

{domain_examples}

With the above, generate a unique set of {non_predictions_N} non-predictions. Think from the perspective of an {prediction_domain} person."""
non_prediction_examples_prompt = PromptTemplate.from_template(non_prediction_examples_template)

In [27]:
non_prediction_input_prompts = [
    ("non_prediction_properties", non_prediction_properties_prompt),
    ("non_prediction_requirements", non_prediction_requirements_prompt),
    ("non_prediction_examples", non_prediction_examples_prompt),
]

non_prediction_pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_non_prediction_prompt, pipeline_prompts=non_prediction_input_prompts
)

## Generate Non-Predictions

- Model isn't generating the specified amount, so will loop for amount wanted.

In [28]:
non_predictions_N = 5

In [29]:
non_prediction_attributes = """Any sentence that does not include prediction variables such as $p$, $p_s$, $p_t$, $p_f$, $p_a$, $p_m$, $p_v$."""

non_prediction_examples = """
- non-prediction examples for template 0:
    1. Reading to children helps develop their language skills and fosters a love for books and learning.
    2. She enjoys reading books on a rainy days including mornings and evenings.
    3. The Grand Canyon in Arizona, a natural wonder, is over 277 miles long and up to 18 miles wide, showcasing stunning geological formations.
    4. Programming languages, such as Python, Java, and C++, are essential tools for software development.
    5. As the sun sets in the west, it gradually dips below the horizon, casting a warm, golden glow across the landscape.
    6. Professionals in health informatics work on developing and implementing systems like electronic health records (EHRs), telemedicine platforms, and decision support tools
    7. Juneteenth is celebrated on June 19th as it marks the end of slavery in the United States as in not everyone was free during Emancipation Proclamation signing, known as July 4th.
    8. She baked a cake for her friend's birthday party and they went for a hike in the mountains and enjoyed the view.
    9. The human brain, a complex organ, contains approximately 86 billion neurons, enabling a vast range of cognitive functions and abilities.
    10. He wrote a letter to his mother, grandmother, father, and grandfather, telling her about his new job."""


In [30]:
non_predictions_input_dict = {
    "prediction_domain": "any",
    "any_non_prediction_domain_attribute": non_prediction_attributes,
    "domain_examples": non_prediction_examples,
    "non_predictions_N": non_predictions_N
}

non_prediction_prompt_output = non_prediction_pipeline_prompt.format(**non_predictions_input_dict)
# print(non_prediction_prompt_output)

# label = 0
# domain = "any"

# non_predictions_df = llama_generation_model.generate_predictions(non_prediction_prompt_output, label, domain)
# non_predictions_df

In [31]:
# updated_non_predictions_df = DataProcessing.reformat_df_with_template_number(non_predictions_df, col_name="Base Sentence")
# updated_non_predictions_df

## Batch Generation Data

In [32]:
tgmf = TextGenerationModelFactory()

N_batches = 1

# text_generation_models = [llama_versatile_generation_model, llama_instant_generation_model, llama_70b_8192_generation_model, llama_8b_8192_generation_model, mixtral_generation_model]
text_generation_models = [llama_instant_generation_model, llama_8b_8192_generation_model]

# text_generation_models = [llama_versatile_generation_model, llama_70b_8192_generation_model, mixtral_generation_model]

In [33]:
prediction_domains = ["finance", "health", "policy", "weather"]
prediction_prompt_outputs = {
    "finance": financial_prompt_output,
    "health": health_prompt_output,
    "policy": policy_prompt_output,
    "weather": weather_prompt_output
}
prediction_label = 1

batched_predictions_df = tgmf.batch_generate_predictions(N_batches=N_batches, 
                                text_generation_models=text_generation_models, 
                                domains=prediction_domains,
                                prompt_outputs=prediction_prompt_outputs,
                                sentence_label=prediction_label)

  0%|          | 0/1 [00:00<?, ?it/s]

finance --- <text_generation_models.LlamaInstantTextGenerationModel object at 0x134c2fad0>
finance --- <text_generation_models.Llama8B8192TextGenerationModel object at 0x1371b6bd0>

health --- <text_generation_models.LlamaInstantTextGenerationModel object at 0x134c2fad0>
health --- <text_generation_models.Llama8B8192TextGenerationModel object at 0x1371b6bd0>

policy --- <text_generation_models.LlamaInstantTextGenerationModel object at 0x134c2fad0>
policy --- <text_generation_models.Llama8B8192TextGenerationModel object at 0x1371b6bd0>

weather --- <text_generation_models.LlamaInstantTextGenerationModel object at 0x134c2fad0>
weather --- <text_generation_models.Llama8B8192TextGenerationModel object at 0x1371b6bd0>


100%|██████████| 1/1 [00:45<00:00, 45.68s/it]







In [34]:
pd.set_option('max_colwidth', 800)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
batched_predictions_df

[                                                                                                                                                                         Base Sentence  \
 0   T1: On August 21, 2024, Emily Davis, a financial reporter, predicts that the operating cash flow at ExxonMobil will likely decrease from $20 billion to $15 billion in Q3 of 2027.   
 1                           T2: In October 2024, Julian Hall from Yahoo Finance, envisions that the stock price at Apple will rise from $800 to $1,000 per share in Quarter 4 of 2029.   
 2               T3: Dija Gabe, a financial expert, predicts on November 18, 2024, that the research and development expenses at Alphabet may stay stable at $20 million in Q2 of 2028.   
 3                        T4: According to a top executive from Chevron, on August 21, 2024, the net profit is expected to increase beyond $10,000 USD in the time frame of Q3 of 2029.   
 4                                                   T5: In Q2 of

In [35]:
predictions_df = DataProcessing.concat_dfs(batched_predictions_df)
predictions_df

Unnamed: 0,Base Sentence,Sentence Label,Model Name,Domain,Batch Index
0,"T1: On August 21, 2024, Emily Davis, a financial reporter, predicts that the operating cash flow at ExxonMobil will likely decrease from $20 billion to $15 billion in Q3 of 2027.",1,llama-3.1-8b-instant,finance,0
1,"T2: In October 2024, Julian Hall from Yahoo Finance, envisions that the stock price at Apple will rise from $800 to $1,000 per share in Quarter 4 of 2029.",1,llama-3.1-8b-instant,finance,0
2,"T3: Dija Gabe, a financial expert, predicts on November 18, 2024, that the research and development expenses at Alphabet may stay stable at $20 million in Q2 of 2028.",1,llama-3.1-8b-instant,finance,0
3,"T4: According to a top executive from Chevron, on August 21, 2024, the net profit is expected to increase beyond $10,000 USD in the time frame of Q3 of 2029.",1,llama-3.1-8b-instant,finance,0
4,"T5: In Q2 of 2026, Mr. Mike predicts that the operating income at Microsoft will fall by 407 percent to $50M on November 18, 2026.",1,llama-3.1-8b-instant,finance,0
5,"T1: On August 21, 2024, Ava Lee predicts that the gross profit at Tesla may increase as much as $30,000,000, reflecting a 1209 percent increase, in Q3 of 2027.",1,llama-3.1-8b-instant,finance,0
6,"T2: In Q4 of 2027, Julian Hall from Yahoo Finance, envisions that the revenue at Facebook will rise from $25 billion to $30 billion in Q2 of 2029.",1,llama-3.1-8b-instant,finance,0
7,"T3: Emily Davis, a financial reporter, predicts on August 21, 2024, that the net profit at Amazon has a probability of 11 percent to reach $30,000 USD in Q3 of 2028.",1,llama-3.1-8b-instant,finance,0
8,"T4: According to a top executive from McDonald's, on August 21, 2024, the operating income is expected to decrease beyond $5 billion in the time frame of Q2 of 2029.",1,llama-3.1-8b-instant,finance,0
9,"T5: In Q3 of 2028, Dija Gabe, a financial expert, predicts that the stock price at Alphabet will fall from $1,000 to $800 per share on November 18, 2028.",1,llama-3.1-8b-instant,finance,0


In [39]:
logger = DataFrameLogger()
logger.log_df(predictions_df)

In [None]:
logged_data = logger.load_log()
logged_data

In [None]:
non_prediction_domains = ["any", "any", "any", "any"]
non_prediction_prompt_outputs = {
    "any": non_prediction_prompt_output,
}
non_prediction_label = 0

batched_non_predictions_df = tgmf.batch_generate_predictions(N_batches=N_batches,
                                text_generation_models=text_generation_models,
                                domains=non_prediction_domains,
                                prompt_outputs=non_prediction_prompt_outputs,
                                sentence_label=non_prediction_label)


In [None]:
pd.set_option('max_colwidth', 800)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
batched_non_predictions_df

In [None]:
non_predictions_df = DataProcessing.concat_dfs(batched_non_predictions_df)
non_predictions_df

In [None]:
non_pred_formatter = DataFrameFormatter('%(asctime)s %(levelname)-8s %(message)s', n_rows=4)
non_pred_formatter.log_df(non_predictions_df, sentence_label=non_prediction_label)

## Store Predictions and Non-Predictions

In [None]:
%store updated_predictions_df
%store updated_non_predictions_df