# 1-Generate Predictions using LangChain

- **Goal:** Prediction Recognition

- **Purpose:** To implement step 1 with sub steps of prediction recognition pipeline. See steps
    1. Generate predictions
        1. Create several prediction prompts templates
        2. Utilize open-source LLMs to generate predictions

- **Misc:**
    - `%store`: Cell magic will store the variable of interest so we can load in another notebook

In [1]:
import os
import sys

import pandas as pd

from langchain_core.prompts import PipelinePromptTemplate, PromptTemplate

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from pipelines import BasePipeline
from data_processing import DataProcessing

In [2]:
pd.set_option('max_colwidth', 800)
base_pipeline = BasePipeline()

## LangChain Templates

In [3]:
full_prediction_template = """{prediction_properties}

{prediction_requirements}

{prediction_templates}

{prediction_examples}"""
full_prediction_prompt = PromptTemplate.from_template(full_prediction_template)

Google predictive spelling/autocomplete 

In [4]:
prediction_properties_template = """A prediction ($p$) consists of the following eight properties:

    1. $p_p$, {prediction_domain} person that predicted $p$
        - Can be a person (with a name) or a {prediction_domain} person such as a {prediction_domain} reporter, {prediction_domain} analyst, {prediction_domain} expert, {prediction_domain} top executive, {prediction_domain} senior level person, etc).
    2. $p_o$, {prediction_domain} organization 
        - Can only be an organization or entity that is associated with the {prediction_domain} prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, {prediction_domain} prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the {prediction_domain} domain.
        - Some examples are {prediction_domain_attribute}.
    6. $p_s$, slope that indicates the direction of change in $p_a$
        - Change of directions can be rise/increase/as much as, fall/decrease/as little as, change, stay stable, high/low chance/probability/degree of, etc.
    7. $p_m$, metric outcome
        - How much will the $p_a$ $p_s$?
    8. $p_v$, future verb tense
        - A verb that is associated with the future such as will, would, be going to, should, etc.
    9. $p_l$, location
        - The location is attached to attribute $p_a$ if {prediction_domain} == 'weather'
    """
prediction_properties_prompt = PromptTemplate.from_template(prediction_properties_template)

    - Keep the brackets around the prediction properties when generating predictions and be sure to include brackets around dates such as "2024-10-15", "2024/08/20", "Q4 of 2024", "2025", "2027 Q1", "Q3 2027", "On 21 Aug 2024".

In [5]:
prediction_requirements_template = """{prediction_domain} requirements to use for each prediction:

    - Should be based on real-world {prediction_domain} data and not hallucinate.
    - Only a simple sentence (prediction) (and NOT compounding using "and" or "or").
    - Should diversify all eight properties of the prediction ($p$).
    - Should use synonyms of predicts such as forecasts, speculates, foresee, envision, etc., and not use any of them more than ten times.
    - The prediction should be unique and not repeated.
    - The forecast time ($p_f$) should always be after current time ($p_t$) of when forecast ($p$) was made.
    - Do not number the predictions.
    - Do not say, "As the {prediction_domain} at organization ($p_o$), I will generate company-based {prediction_domain} predictions using the provided templates." or anything similar.
    - Should have a forecast time ($p_f$) when $p$ is expected to come to fruition between 2025 to 2050.
    - Use the five different templates and examples provided.
    - Change how the current time ($p_t$) and forecast time ($p_f$) are written in the prediction with examples of (1) Wednesday, August 21, 2024; (2) Wed, August 21, 2024; (3) 08/21/2024; (4) 08/21/2024; (5) 21/08/2024; (6) 21 August 2024; (7) 2024/08/21; (8) 2024-08-21; (9) August 21, 2024; (10) Aug 21, 2024; (11) 21 August 2024, (12) 21 Aug 2024, Q3 of 2027, 2029 of Q3, etc (with removing day of week).
    {domain_requirements}
    - Do not say, "Here are 10 unique weather predictions based on the provided templates and examples:" in the prompt.
    - Do not use any of the examples in the prompt.
    - In front of every prodiction, put the template number in the format of "T1:", "T2:", etc."""
prediction_requirements_prompt = PromptTemplate.from_template(prediction_requirements_template)

In [6]:
prediction_templates_template = """Here are some {prediction_domain} templates:

- {prediction_domain} template 1: On [ $p_t$ ], [ $p_p$ ] predicts that the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] [ $p_s$ ] by [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 2: In [ $p_t$ ], [ $p_p$ ] from [ $p_o$ ], predicts that the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] from [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 3: [ $p_p $] predicts on [ $p_t$ ] that the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] [ $p_s$ ] under [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 4: According to a [ $p_p$ ] from [ $p_o$ ], on [ $p_t$ ], the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] beyond [ $p_m$ ] in the timeframe of [ $p_f$ ].
- {prediction_domain} template 5: In [ $p_f$ ], the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] a [ $p_m$ ] [ $p_s$ ], as predicted by [ $p_p$ ] on [ $p_t$ ]."""
prediction_templates_prompt = PromptTemplate.from_template(prediction_templates_template)

In [7]:
prediction_examples_template = """Here are some examples of {prediction_domain} predictions:

{domain_examples}


With the above, generate a unique set of {predictions_N} financial predictions. Think from the perspective of an {prediction_domain} anlyst, expert, top executive, or senior level person."""
prediction_examples_prompt = PromptTemplate.from_template(prediction_examples_template)

In [8]:
input_prompts = [
    ("prediction_properties", prediction_properties_prompt),
    ("prediction_requirements", prediction_requirements_prompt),
    ("prediction_templates", prediction_templates_prompt),
    ("prediction_examples", prediction_examples_prompt),
]

pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prediction_prompt, pipeline_prompts=input_prompts
)

  pipeline_prompt = PipelinePromptTemplate(


## Generate Domain Predictions

### Generate Financial Predictions

In [9]:
financial_attributes = """stock price, net profit, revenue, operating cash flow, research and development expenses, operating income, gross profit."""
financial_requirements = """- Should be based on real-world financial earnings reports.
    - Suppose the time when $p$ was made is during any earning season.
    - Include stocks from all sectors such as consumer staples, energy, finance, health care, industrials, materials, media, real estate, retail, technology, utilities, defense, etc.
    - Include the US Dollar sign ($) before or USD after the amount of the financial attribute."""

# For each template, have a rise, fall, or stable example, respectively.
financial_examples = """
- financial examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious, an investor] forecasts that the [revenue] at [Apple] [will likely] [decrease] from [$87B to $50 billion] in [2025 Q1].
    2. On [Tue, November 19, 2024], [Ava Lee] predicts that the [operating cash flow] at [ExxonMobil] [should] [decrease] by [5 percent to $20 billion] in [08/21/2025].
- financial examples for template 2:
    3. In [October 2024], [Julian Hall] from [Yahoo Finance], envisions that the [stock price] [will] [rise] from [$800 to $1,000 per share] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Mrs. Kalia] from [McDonald's], predicts that the [net profit] [will] [fall] under [5% to $5 billion] in [January of 2029].
- financial examples for template 3:
    5. [Dija Gabe, a financial expert] predicts on [23 October 2024] that the [research and development expenses] at [Alphabet] [may] [stay stable] at [$20 million] in [2027 Quarter 4].
    6. [Mr. Mike] predicts in [Q2 2026] that the [operating income] at [Microsoft] [will] [fall] by [407 percent to $50M] on [Monday, Nov 18, 2026].
- financial examples for template 4:
    7. According to a [top executive] from [Chevron], on [08/21/2024], the [net profit] [is expected to] [increase] beyond [$10,000] in the timeframe of [Q3 of 2029].
    8. According to [Brittany] from [Tesla], on [Fri, July 12, 2024], the [gross profit] [may] [increase] as much as [$30,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- financial examples for template 5:
    9. In [2025-08-21], the [net profit] at [Amazon] has a [probability] of [11 percent to reach $30k] [decrease], as predicted by [Emily Davis, a financial reporter] on [21 Oct 24].
    10. In [Quarter of 2027], the [revenue] at [Facebook] [is expected] to be [$30 billion, which is a 15%] [rise], as predicted by [a financial analyst] on [Sun, February 20, 2024]."""

In [10]:
financial_input_dict = {
    "prediction_domain": "financial",
    "prediction_domain_attribute": financial_attributes,
    "domain_requirements": financial_requirements,
    "domain_examples": financial_examples,
    "predictions_N": 10
}
financial_prompt_output = pipeline_prompt.format(**financial_input_dict)
print(financial_prompt_output)

financial_df = base_pipeline.generate_predictions(financial_prompt_output, 1, "financial")
financial_df

A prediction ($p$) consists of the following eight properties:

    1. $p_p$, financial person that predicted $p$
        - Can be a person (with a name) or a financial person such as a financial reporter, financial analyst, financial expert, financial top executive, financial senior level person, etc).
    2. $p_o$, financial organization 
        - Can only be an organization or entity that is associated with the financial prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, financial prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the financial domain.
        - Some examples are stock price, net prof

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On 2024-10-15, Rachel Patel, a financial analyst, predicts that the operating cash flow at General Motors will likely decrease by $5 billion to $10 billion in Q2 of 2026.",1,llama-3.3-70b-versatile,financial
1,"T2: In 2024, Julian Sanchez from Bank of America, forecasts that the stock price will rise from $50 to $75 per share in 2028.",1,llama-3.3-70b-versatile,financial
2,"T3: Emily Wilson, a financial expert, predicts on 20/08/2024 that the research and development expenses at Pfizer may stay stable at $15 million in 2029.",1,llama-3.3-70b-versatile,financial
3,"T4: According to a senior executive from Cisco, on 2024/08/20, the net profit is expected to increase beyond $8 billion in the timeframe of Q4 of 2027.",1,llama-3.3-70b-versatile,financial
4,"T5: In 2025-02-18, the revenue at Visa has a probability of 20 percent to reach $25 billion, which is a 10% increase, as predicted by David Lee, a financial reporter, on 15 Oct 2024.",1,llama-3.3-70b-versatile,financial
5,"T1: On Wednesday, November 20, 2024, Michael Davis, a financial analyst, predicts that the gross profit at 3M will likely decrease by 15% to $12 billion in Q1 of 2026.",1,llama-3.3-70b-versatile,financial
6,"T2: In Q3 of 2024, Olivia Brown from Johnson & Johnson, envisions that the operating income will rise from $10 billion to $15 billion in 2028.",1,llama-3.3-70b-versatile,financial
7,"T3: Kevin White, a financial expert, predicts on 10/10/2024 that the revenue at AT&T may increase by $5 billion to $20 billion in 2027.",1,llama-3.3-70b-versatile,financial
8,"T4: According to a top executive from Intel, on 2024-07-25, the net profit is expected to increase beyond $12 billion in the timeframe of Q2 of 2029.",1,llama-3.3-70b-versatile,financial
9,"T5: In 2026-08-25, the stock price at McDonald's is expected to be $200 per share, which is a 25% increase, as predicted by Sophia Rodriguez, a financial analyst, on 25 July 2024.",1,llama-3.3-70b-versatile,financial


### Generate Weather Predictions

In [11]:
weather_prediction_templates_template = """Here are some {prediction_domain} templates:

- {prediction_domain} template 1: On [ $p_t$ ], [ $p_p$ ] predicts that the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] [ $p_s$ ] by [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 2: In [ $p_t$ ], [ $p_p$ ] from [ $p_o$ ] in [ $p_l$ ], predicts that the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] from [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 3: [ $p_p $] predicts on [ $p_t$ ] that the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] [ $p_s$ ] under [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 4: According to a [ $p_p$ ] from [ $p_o$ ] in [ $p_l$ ], on [ $p_t$ ], the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] beyond [ $p_m$ ] in the timeframe of [ $p_f$ ].
- {prediction_domain} template 5: In [ $p_f$ ], the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] a [ $p_m$ ] [ $p_s$ ], as predicted by [ $p_p$ ] on [ $p_t$ ]."""
weather_prediction_templates_prompt = PromptTemplate.from_template(weather_prediction_templates_template)

In [12]:
weather_attributes = """temperature, precipitation, wind speed, humidity, etc."""
weather_requirements = """- Should be based on real-world weather reports.
    - Suppose the time when $p$ was made is during any season and any location (ie: Florida known for hurricanes, California known for wildfires, etc).
    - Include reports from all meteorologists, weather organizations, or any type of weather entity.."""

# For each template, have a rise, fall, or stable example, respectively.?
weather_examples = """
- weather examples for template 1:
    1. On [Monday, December 16, 2024], [Dr. Melissa Carter] a weather expert at the [National Weather Service], forecasts that the [temperature], in [New York City] [will likely] [decrease] from [5°C to 3°C] on [February 16, 2025 (Fri)].
    2. On [Tue, 19 November 2024], [Ethan James] at the [US Weather Center] predicts that the [precipitation levels], in [San Francisco] [are likely to] [increase] by [20%] in the timeframe of [08/21/2025].
- weather examples for template 2:
    3. In [October 2024], [Samantha Lin] from [NOAA], envisions that the [wind speed] [should] [decrease] by [15 mph] in [Chicago] by [Friday, March 22, 2025].
    4. In [8/15/2027], [Carlos Rivera] from [Weather.com] predicts that the [humidity] [will] [rise] by [30%] in [Miami] in [July of 2025].
- weather examples for template 3:
    5. [Amanda Green], a weather reporter from [Bureau of Meteorology]  predicts on [23 October 2024] that the [temperature] in [Seattle], [will] [fall] by [10°F] in [2025 Quarter 1].
    6. [Mr. Tommy Wu], from [US Weather Center] predicts in [Q2 2026] that [snowfall levels], in [Denver] [will likely] [increase] by [8 inches] in [Monday, Nov 18, 2026].
- weather examples for template 4:
    7. According to a [top executive] from [AccuWeather], on [12/21/2024], the [rainfall] in [Portland] [is expected to] [increase] beyond [10 percent] in the timeframe of [early 2025].
    8. According to [David Harper] from [Weather Underground, on [Fri, August 9, 2024], the [air quality index] in [Los Angeles] [is likely to] [improve] by [20%] in [21 Aug 2024].
- weather examples for template 5:
    9. In [2025-08-21], the [average temperature] in [Houston] has a [probability] of [5 percent to] [decrease], as predicted by [King, a weather reporter] from [Meteorological Department] on [21 Oct 24].
    10. In [Quarter of 2027], wind chill] in [Minneapolis] [is expected] to be [10°F, which is a 15%] [rise], as predicted by [a weather analyst named Ortiz] on [Sun, February 20, 2024]."""

In [13]:
weather_input_prompts = [
    ("prediction_properties", prediction_properties_prompt),
    ("prediction_requirements", prediction_requirements_prompt),
    ("prediction_templates", weather_prediction_templates_prompt),
    ("prediction_examples", prediction_examples_prompt),
]

weather_pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prediction_prompt, pipeline_prompts=weather_input_prompts
)

weather_input_dict = {
    "prediction_domain": "weather",
    "prediction_domain_attribute": weather_attributes,
    "domain_requirements": weather_requirements,
    "domain_examples": weather_examples,
    "predictions_N": 10
}
weather_prompt_output = weather_pipeline_prompt.format(**weather_input_dict)
print(weather_prompt_output)

weather_df = base_pipeline.generate_predictions(weather_prompt_output, 1, "weather")
weather_df

A prediction ($p$) consists of the following eight properties:

    1. $p_p$, weather person that predicted $p$
        - Can be a person (with a name) or a weather person such as a weather reporter, weather analyst, weather expert, weather top executive, weather senior level person, etc).
    2. $p_o$, weather organization 
        - Can only be an organization or entity that is associated with the weather prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, weather prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the weather domain.
        - Some examples are temperature, precipitation, wind speed, hum

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On 2024-10-15, Dr. Rachel Kim, a senior meteorologist at the National Oceanic and Atmospheric Administration, forecasts that the precipitation levels in Los Angeles will likely increase by 15% in Q2 of 2026.",1,llama-3.3-70b-versatile,weather
1,"T2: In 2024-08, Emily Chen from the European Centre for Medium-Range Weather Forecasts envisions that the humidity in Beijing should decrease by 12% in 2027-03.",1,llama-3.3-70b-versatile,weather
2,"T3: Michael Davis, a weather expert from the Meteorological Office, predicts on 2024-09-20 that the temperature in Sydney will fall by 8°C in the timeframe of 2025-09.",1,llama-3.3-70b-versatile,weather
3,"T4: According to a top executive from the World Meteorological Organization, on 2024-11-01, the wind speed in Tokyo is expected to rise beyond 20 mph in the timeframe of 2026-06.",1,llama-3.3-70b-versatile,weather
4,"T5: In 2028-02, the average sea level in Miami has a probability of 10% to increase by 2 inches, as predicted by Dr. Lisa Nguyen, a weather analyst from the National Weather Service, on 2024-07-15.",1,llama-3.3-70b-versatile,weather
5,"T1: On 2024-12-25, James Lee, a weather reporter from the Korean Meteorological Administration, forecasts that the snowfall levels in New York City will likely decrease by 5 inches in Q1 of 2027.",1,llama-3.3-70b-versatile,weather
6,"T2: In 2025-01, David Taylor from the Australian Bureau of Meteorology predicts that the air quality index in Melbourne will improve by 15% in 2026-10.",1,llama-3.3-70b-versatile,weather
7,"T3: Dr. Sophia Patel, a weather expert from the Indian Meteorological Department, predicts on 2024-06-01 that the temperature in Mumbai will rise by 5°C in the timeframe of 2027-05.",1,llama-3.3-70b-versatile,weather
8,"T4: According to a senior meteorologist from the Chinese Meteorological Administration, on 2024-03-15, the rainfall in Shanghai is expected to increase beyond 20% in the timeframe of 2026-08.",1,llama-3.3-70b-versatile,weather
9,"T5: In 2029-04, the wind chill in Chicago has a probability of 12% to decrease by 8°F, as predicted by Dr. Kevin White, a weather analyst from the National Centers for Environmental Prediction, on 2024-02-20.",1,llama-3.3-70b-versatile,weather


### Generate Health Predictions

In [14]:
health_attributes = """obesity rates, prevalence of chronic illnesses, average physical activity levels, nutritional intake, etc."""
health_requirements = """- Should be based on real-world health reports.
    - Suppose the time when $p$ was made is during any season such as flu season, allergy season, pandemic, epidemic, etc.
    - Include reports from all Health organization, researcher, doctor, physical therapist, physician assistant, nurse practictioners, fitness expert, etc."""

# For each template, have a rise, fall, or stable example, respectively.?
health_examples = """
- health examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious, an investor] forecasts that the [obesity rate] at the [United States] [will likely] [decrease] by [5%] in [2025 Q1].
    2. On [Tue, November 19, 2024], [medical professional Sophia Rodriguez] predicts that the [cancer rate] in [Georgia] [should] [decrease] by [4 percent] in [08/21/2025].
- health examples for template 2:
    3. In [October 2024], [Arjun Patel, Ph.D] from [Florida Department of Health] envisions that the [average daily caloric intake] [may] [rise] from [100 to 300] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Dr. Michael Brown] from the [Centers for Disease Control and Prevention], foresee that the [average daily caloric intake] [will] [fall] [8 percent] in [2027]
- Examples for template 3: .
- health examples for template 3:
    5. [A trusted expert] predicts on [23 October 2024] that the [global vaccination rate for measles] in the [US] [should] [stay stable] at [100K people] in [2027 Quarter 4].
    6. [Dr. Sarah Johnson] foresee in [Q2 2026] that the [prevalence of hypertension] in [California] [will] [fall] by [407 percent] by [Monday, Nov 18, 2026].
- health examples for template 4:
    7. According to a [Olivia Martinez] from [Stanford University], on [08/21/2024], the prevalence of [type 2 diabetes in adults] [is expected to] [increase] beyond [8.5 percent] in the timeframe of [Q3 of 2029].
    8. According to [Rachel Kim, MD] from the [University of California], on [Fri, July 12, 2024], the prevalence of [type 2 diabetes in adults] [may] [increase] as much as [30,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- health examples for template 5:
    9. In [2025-08-21], the [average weekly exercise hours] in [United States] has a [probability] of [20 percent to reach 30k], as predicted by [Emily Davis, Harvard School of Public Health] on [21 Oct 24].
    10. In [Quarter of 2027], the [average weekly walking hours] in [Atlanta] is [expected to rise] by [15%], as predicted by the [Monique, National Institutes of Health] on [Sun, February 20, 2024]."""

In [15]:
health_input_dict = {
    "prediction_domain": "health",
    "prediction_domain_attribute": health_attributes,
    "domain_requirements": health_requirements,
    "domain_examples": health_examples,
    "predictions_N": 10
}

health_prompt_output = pipeline_prompt.format(**health_input_dict)
print(health_prompt_output)

health_df = base_pipeline.generate_predictions(health_prompt_output, 1, "health")
health_df

A prediction ($p$) consists of the following eight properties:

    1. $p_p$, health person that predicted $p$
        - Can be a person (with a name) or a health person such as a health reporter, health analyst, health expert, health top executive, health senior level person, etc).
    2. $p_o$, health organization 
        - Can only be an organization or entity that is associated with the health prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, health prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the health domain.
        - Some examples are obesity rates, prevalence of chronic illnesses, averag

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On 2024-10-15, Dr. David Lee, a health expert, predicts that the average blood pressure at the American Heart Association will likely decrease by 10% in Q2 of 2026.",1,llama-3.3-70b-versatile,health
1,"T2: In 2024, Dr. Maria Rodriguez from the World Health Organization envisions that the global life expectancy may rise from 72 to 75 years in 2028.",1,llama-3.3-70b-versatile,health
2,T3: Dr. John Smith predicts on 08/20/2024 that the prevalence of mental health disorders in the United States should stay stable at 20% in 2027.,1,llama-3.3-70b-versatile,health
3,"T4: According to a report by Dr. Emily Chen from the Centers for Disease Control and Prevention, on 21 Aug 2024, the obesity rate among children is expected to increase beyond 25% in the timeframe of Q4 of 2029.",1,llama-3.3-70b-versatile,health
4,"T5: In Q1 of 2025, the average daily step count in Australia is expected to rise by 15%, as predicted by Dr. Michael Brown from the National Institutes of Health on 2024-10-10.",1,llama-3.3-70b-versatile,health
5,"T1: On 2024/11/12, Dr. Sophia Patel, a health analyst, forecasts that the cancer survival rate at the Mayo Clinic will likely increase by 12% in 2026.",1,llama-3.3-70b-versatile,health
6,"T2: In 2025, Dr. Kevin White from the American Cancer Society predicts that the average daily caloric intake may fall from 2500 to 2000 calories in Quarter 2 of 2028.",1,llama-3.3-70b-versatile,health
7,T3: Dr. Lisa Nguyen predicts on 2024-08-25 that the global vaccination rate for influenza in the European Union should increase by 15% in 2027.,1,llama-3.3-70b-versatile,health
8,"T4: According to a report by Dr. Daniel Kim from the University of California, on 10/15/2024, the prevalence of chronic diseases among adults is expected to decrease beyond 30% in the timeframe of Q3 of 2030.",1,llama-3.3-70b-versatile,health
9,"T5: In 2029-08-20, the average weekly exercise hours in Canada is expected to rise by 20%, as predicted by Dr. Olivia Martin from the Canadian Medical Association on 2024/10/20.",1,llama-3.3-70b-versatile,health


### Generate Policy Predictions

In [16]:
policy_attributes = """election outcomes, economic reforms, legislative impacts."""
policy_requirements = """- Should be based on real-world policy reports.
    - Suppose the time when $p$ was made is during an election cycle or non-election cycles.
    - Include policies & laws, from all sectors such as consumer staples, energy, finance, health care, industrials, materials, media, real estate, retail, technology, utilities, defense, etc."""

policy_examples = """
- policy examples for template 1:
    1. On [Monday, December 16, 2024], [President John Doe] forecasts that the [unemployment rate] at [the United States] [will likely] [decrease] by [2%] in [2025 Q1].
    2. On [Tue, November 19, 2024], [Dr. Jane Smith] foresee that the [population growth rate] in [California] [is likely to] [decrease] by [5 percent to 20 billion] in [08/21/2025].
- policy examples for template 2:
    3. In [October 2024], [Senator Emily Johnson] from [the Senate Committee on Finance], envisions that the [inflation rate] [should] [rise] from [1.3 percent to 89 percen] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Governor Michael Brown] from [the State of Texas], predicts that the [number of registered voters] [will] [fall] under [5B] in [Dec of 2029].
- policy examples for template 3:
    5. [Dija Gabe in the Congressional Budget Office] predicts on [23 October 2024] that [national debt] in [USA] [may] [stay stable] at [20 million] in [2027 Quarter 4].
    6. [Dr. Sarah Lee] foresee in [Q2 2026] that the [median household income] in [NY] [should] [fall] by [629 percent to $15,000] on [Monday, Nov 18, 2026].
- policy examples for template 4:
    7. According to a [General Robert Williams] from [the Department of Defense], on [08/21/2024], the [number of active-duty soldiers] [is expected to] [increase] beyond [$10,000] in the timeframe of [Q3 of 2029].
    8. According to [Dr. Olivia Martinez] from [the Census Bureau], on [Fri, July 12, 2024], the [population density] in [urban areas] [is likely to] [increase] as much as [100,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- policy examples for template 5:
    9. In [2025-08-21], the [number of citizens] in [Thomson, GA, 30824] has a [probability] of [92 percent to reach 30k] [decrease], as predicted by [Shirly Tisdale, a policy reporter] on [21 Oct 24].
    10. In [Quarter of 2027], the  [number of Navy members] in [the United States] [is expected] to be [300K, which is a 15%] [rise], as predicted by [a policy analyst] on [Sun, February 20, 2024]."""

In [17]:
policy_input_dict = {
    "prediction_domain": "policy",
    "prediction_domain_attribute": policy_attributes,
    "domain_requirements": policy_requirements,
    "domain_examples": policy_examples,
    "predictions_N": 10
}

policy_prompt_output = pipeline_prompt.format(**policy_input_dict)
print(policy_prompt_output)

policy_df = base_pipeline.generate_predictions(policy_prompt_output, 1, "policy")
policy_df

A prediction ($p$) consists of the following eight properties:

    1. $p_p$, policy person that predicted $p$
        - Can be a person (with a name) or a policy person such as a policy reporter, policy analyst, policy expert, policy top executive, policy senior level person, etc).
    2. $p_o$, policy organization 
        - Can only be an organization or entity that is associated with the policy prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, policy prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the policy domain.
        - Some examples are election outcomes, economic reforms, legislative impac

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On 2024-10-15, policy expert, John Smith, predicts that the stock price at Goldman Sachs will rise by 10% in Q2 of 2026.",1,llama-3.3-70b-versatile,policy
1,"T2: In 2024, policy analyst, Emily Johnson, from the Federal Reserve, envisions that the interest rate will increase from 2% to 4% in 2028.",1,llama-3.3-70b-versatile,policy
2,"T3: Policy senior level person, Michael Brown, predicts on 08/20/2024 that the GDP in the United States may stay stable at $20 trillion in 2027.",1,llama-3.3-70b-versatile,policy
3,"T4: According to a policy top executive, Robert Williams, from JPMorgan Chase, on 2024-08-22, the number of bank branches is expected to decrease beyond 5,000 in the timeframe of Q4 of 2029.",1,llama-3.3-70b-versatile,policy
4,"T5: In 2025-08-20, the inflation rate in the European Union has a probability of 80% to reach 3%, as predicted by policy reporter, Sarah Lee, on 2024-10-10.",1,llama-3.3-70b-versatile,policy
5,"T1: On 2024/11/12, policy expert, David White, forecasts that the exchange rate at the European Central Bank will likely decrease by 5% in Q1 of 2027.",1,llama-3.3-70b-versatile,policy
6,"T2: In Q3 of 2024, policy analyst, Kevin Davis, from the International Monetary Fund, predicts that the unemployment rate will fall from 5% to 3% in 2026.",1,llama-3.3-70b-versatile,policy
7,"T3: Policy senior level person, Olivia Martinez, predicts on 2024-09-18 that the national debt in Japan may increase by 10% in 2028.",1,llama-3.3-70b-versatile,policy
8,"T4: According to a policy top executive, James Wilson, from Citigroup, on 2024-10-25, the number of credit card holders is expected to rise beyond 100 million in the timeframe of Q2 of 2030.",1,llama-3.3-70b-versatile,policy
9,"T5: In Q4 of 2027, the number of initial public offerings in the United States is expected to be 500, which is a 20% increase, as predicted by policy analyst, Daniel Kim, on 2024-08-15.",1,llama-3.3-70b-versatile,policy


### Combine Domain Predictions

In [18]:
predictions_df = DataProcessing.concat_dfs([financial_df, weather_df, health_df, policy_df])
predictions_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On 2024-10-15, Rachel Patel, a financial analyst, predicts that the operating cash flow at General Motors will likely decrease by $5 billion to $10 billion in Q2 of 2026.",1,llama-3.3-70b-versatile,financial
1,"T2: In 2024, Julian Sanchez from Bank of America, forecasts that the stock price will rise from $50 to $75 per share in 2028.",1,llama-3.3-70b-versatile,financial
2,"T3: Emily Wilson, a financial expert, predicts on 20/08/2024 that the research and development expenses at Pfizer may stay stable at $15 million in 2029.",1,llama-3.3-70b-versatile,financial
3,"T4: According to a senior executive from Cisco, on 2024/08/20, the net profit is expected to increase beyond $8 billion in the timeframe of Q4 of 2027.",1,llama-3.3-70b-versatile,financial
4,"T5: In 2025-02-18, the revenue at Visa has a probability of 20 percent to reach $25 billion, which is a 10% increase, as predicted by David Lee, a financial reporter, on 15 Oct 2024.",1,llama-3.3-70b-versatile,financial
5,"T1: On Wednesday, November 20, 2024, Michael Davis, a financial analyst, predicts that the gross profit at 3M will likely decrease by 15% to $12 billion in Q1 of 2026.",1,llama-3.3-70b-versatile,financial
6,"T2: In Q3 of 2024, Olivia Brown from Johnson & Johnson, envisions that the operating income will rise from $10 billion to $15 billion in 2028.",1,llama-3.3-70b-versatile,financial
7,"T3: Kevin White, a financial expert, predicts on 10/10/2024 that the revenue at AT&T may increase by $5 billion to $20 billion in 2027.",1,llama-3.3-70b-versatile,financial
8,"T4: According to a top executive from Intel, on 2024-07-25, the net profit is expected to increase beyond $12 billion in the timeframe of Q2 of 2029.",1,llama-3.3-70b-versatile,financial
9,"T5: In 2026-08-25, the stock price at McDonald's is expected to be $200 per share, which is a 25% increase, as predicted by Sophia Rodriguez, a financial analyst, on 25 July 2024.",1,llama-3.3-70b-versatile,financial


In [19]:
data_processing = DataProcessing
updated_predictions_df = data_processing.reformat_df_with_template_number(predictions_df, col_name="Base Sentence")
updated_predictions_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain,Template Number
0,"On 2024-10-15, Rachel Patel, a financial analyst, predicts that the operating cash flow at General Motors will likely decrease by $5 billion to $10 billion in Q2 of 2026.",1,llama-3.3-70b-versatile,financial,1
1,"In 2024, Julian Sanchez from Bank of America, forecasts that the stock price will rise from $50 to $75 per share in 2028.",1,llama-3.3-70b-versatile,financial,2
2,"Emily Wilson, a financial expert, predicts on 20/08/2024 that the research and development expenses at Pfizer may stay stable at $15 million in 2029.",1,llama-3.3-70b-versatile,financial,3
3,"According to a senior executive from Cisco, on 2024/08/20, the net profit is expected to increase beyond $8 billion in the timeframe of Q4 of 2027.",1,llama-3.3-70b-versatile,financial,4
4,"In 2025-02-18, the revenue at Visa has a probability of 20 percent to reach $25 billion, which is a 10% increase, as predicted by David Lee, a financial reporter, on 15 Oct 2024.",1,llama-3.3-70b-versatile,financial,5
5,"On Wednesday, November 20, 2024, Michael Davis, a financial analyst, predicts that the gross profit at 3M will likely decrease by 15% to $12 billion in Q1 of 2026.",1,llama-3.3-70b-versatile,financial,1
6,"In Q3 of 2024, Olivia Brown from Johnson & Johnson, envisions that the operating income will rise from $10 billion to $15 billion in 2028.",1,llama-3.3-70b-versatile,financial,2
7,"Kevin White, a financial expert, predicts on 10/10/2024 that the revenue at AT&T may increase by $5 billion to $20 billion in 2027.",1,llama-3.3-70b-versatile,financial,3
8,"According to a top executive from Intel, on 2024-07-25, the net profit is expected to increase beyond $12 billion in the timeframe of Q2 of 2029.",1,llama-3.3-70b-versatile,financial,4
9,"In 2026-08-25, the stock price at McDonald's is expected to be $200 per share, which is a 25% increase, as predicted by Sophia Rodriguez, a financial analyst, on 25 July 2024.",1,llama-3.3-70b-versatile,financial,5


## Generate Non-Predictions

In [20]:
non_prediction_template = """Generate any sentence that's not a prediction, which we name non-prediction. A prediction is below with variables
    - $p$, prediction
        - $p_s$, source that predicted $p$
            - Source can be person, organization, and any type of entity.
        - $p_t$, time when $p$ was made
            - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
        - $p_f$, forecast time when $p$ is expected to come to fruition
            - Forecast can be from seconds to decades in the future.
            - How far to go out? Or where to stop?
        - $p_a$, prediction attribute
            - Financial based attributes such as stock price, net profit, revenue
        - $p_m$, prediction metric outcome
            - How much will the  $p_a$ rise/increase or fall/decrease
        - $p_v$, future verb tense
            - A verb that is associated with the future such as will, would, be going to, should, etc.

    Please generate 40 non-predictions with the following requirements below:

    1. Only a simple non-prediction (sentence) (and NOT compounding using "and" or "or")
    2. Include no additional information such as "Here are nine simple sentences that are not predictions:", number before non-prediction
    3. At least 10 words and no more than 30 words in the non-prediction
    4. Do not generate redundant non-predictions
"""

print(non_prediction_template)
non_prediction_label = 0

non_predictions_df = base_pipeline.generate_predictions(text=non_prediction_template, label=non_prediction_label, domain="any")
non_predictions_df

Generate any sentence that's not a prediction, which we name non-prediction. A prediction is below with variables
    - $p$, prediction
        - $p_s$, source that predicted $p$
            - Source can be person, organization, and any type of entity.
        - $p_t$, time when $p$ was made
            - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
        - $p_f$, forecast time when $p$ is expected to come to fruition
            - Forecast can be from seconds to decades in the future.
            - How far to go out? Or where to stop?
        - $p_a$, prediction attribute
            - Financial based attributes such as stock price, net profit, revenue
        - $p_m$, prediction metric outcome
            - How much will the  $p_a$ rise/increase or fall/decrease
        - $p_v$, future verb tense
            - A verb that is associated with the future such as will, would, be going to, should, etc.

    Please generate 40 non-predictions with th

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,The company is currently undergoing a major restructuring effort internally.,0,llama-3.3-70b-versatile,any
1,The new employee is very happy with the office working environment.,0,llama-3.3-70b-versatile,any
2,The manager is responsible for overseeing daily business operations effectively.,0,llama-3.3-70b-versatile,any
3,The team is working diligently to meet the project deadline successfully.,0,llama-3.3-70b-versatile,any
4,The marketing department is analyzing customer feedback very carefully.,0,llama-3.3-70b-versatile,any
5,The sales team is performing exceptionally well this quarter financially.,0,llama-3.3-70b-versatile,any
6,The IT department is resolving technical issues quickly and efficiently always.,0,llama-3.3-70b-versatile,any
7,The company culture values transparency and open communication highly.,0,llama-3.3-70b-versatile,any
8,The employees are enjoying the new office amenities greatly already.,0,llama-3.3-70b-versatile,any
9,The customer service team is available to assist 24 hours daily.,0,llama-3.3-70b-versatile,any


## Store Predictions and Non-Predictions

In [21]:
%store predictions_df
%store non_predictions_df

Stored 'predictions_df' (DataFrame)
Stored 'non_predictions_df' (DataFrame)


## Further Explaination

- Need to optimize code
- Include multiple models next

### Multiple Models to Generate Predictions

In [21]:
# domains = ["financial", "weather", "health care"]
# llama_model = LlamaTextGenerationModel()
# model_2 = SomeModel()

# # Should all model dfs be a list of dataframes? Then we can concatenate them all together with pd.concat()
# for domain in domains:
#     llama_df = llama_model.generate_predictions(financial_prompt_output, 1, domain)
#     model_2_df = model_2.generate_predictions(financial_prompt_output, 1, domain)


# # or 

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "financial")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "financial")

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "weather")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "weather")

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "health care")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "health care")