# 1-Generate Predictions using LangChain

- **Goal:** Prediction Recognition

- **Purpose:** To implement step 1 with sub steps of prediction recognition pipeline. See steps
    1. Generate predictions
        1. Create several prediction prompts templates
        2. Utilize open-source LLMs to generate predictions

- **Misc:**
    - `%store`: Cell magic will store the variable of interest so we can load in another notebook

In [1]:
import os
import sys

import pandas as pd

from langchain_core.prompts import PipelinePromptTemplate, PromptTemplate

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from pipelines import BasePipeline
from data_processing import DataProcessing

In [2]:
pd.set_option('max_colwidth', 800)
base_pipeline = BasePipeline()

## LangChain Templates for Domain Predictions

In [3]:
full_prediction_template = """{prediction_properties}

{prediction_requirements}

{prediction_templates}

{prediction_examples}"""
full_prediction_prompt = PromptTemplate.from_template(full_prediction_template)

Google predictive spelling/autocomplete 

In [4]:
prediction_properties_template = """A prediction ($p$) consists of the following nine properties:

    1. $p_p$, {prediction_domain} person that predicted $p$
        - Can be a person (with a name) or a {prediction_domain} person such as a {prediction_domain} reporter, {prediction_domain} analyst, {prediction_domain} expert, {prediction_domain} top executive, {prediction_domain} senior level person, etc).
    2. $p_o$, {prediction_domain} organization 
        - Can only be an organization or entity that is associated with the {prediction_domain} prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, {prediction_domain} prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the {prediction_domain} domain.
        - Some examples are {prediction_domain_attribute}.
    6. $p_s$, slope that indicates the direction of change in $p_a$
        - Change of directions can be rise/increase/as much as, fall/decrease/as little as, change, stay stable, high/low chance/probability/degree of, etc.
    7. $p_m$, metric outcome
        - How much will the $p_a$ $p_s$?
    8. $p_v$, future verb tense
        - A verb that is associated with the future such as will, would, be going to, should, etc.
    9. $p_l$, location
        - The location is attached to attribute $p_a$ if {prediction_domain} == 'weather'
    """
prediction_properties_prompt = PromptTemplate.from_template(prediction_properties_template)

    - Keep the brackets around the prediction properties when generating predictions and be sure to include brackets around dates such as "2024-10-15", "2024/08/20", "Q4 of 2024", "2025", "2027 Q1", "Q3 2027", "On 21 Aug 2024".

In [5]:
prediction_requirements_template = """{prediction_domain} requirements to use for each prediction:

    - Should be based on real-world {prediction_domain} data and not hallucinate.
    - Only a simple sentence (prediction) (and NOT compounding using "and" or "or").
    - Should diversify all nine properties of the prediction ($p$).
    - Should use synonyms of predicts such as forecasts, speculates, foresee, envision, etc., and not use any of them more than ten times.
    - The prediction should be unique and not repeated.
    - The forecast time ($p_f$) should always be after current time ($p_t$) of when forecast ($p$) was made.
    - Do not number the predictions.
    - Do not say, "As the {prediction_domain} at organization ($p_o$), I will generate company-based {prediction_domain} predictions using the provided templates." or anything similar.
    - Should have a forecast time ($p_f$) when $p$ is expected to come to fruition between 2025 to 2050.
    - Use the five different templates and examples provided.
    - Change how the current time ($p_t$) and forecast time ($p_f$) are written in the prediction with examples of (1) Wednesday, August 21, 2024; (2) Wed, August 21, 2024; (3) 08/21/2024; (4) 08/21/2024; (5) 21/08/2024; (6) 21 August 2024; (7) 2024/08/21; (8) 2024-08-21; (9) August 21, 2024; (10) Aug 21, 2024; (11) 21 August 2024, (12) 21 Aug 2024, Q3 of 2027, 2029 of Q3, etc (with removing day of week).
    {domain_requirements}
    - Do not say, "Here are 10 unique weather predictions based on the provided templates and examples:" in the prompt.
    - Do not use any of the examples in the prompt.
    - In front of every prodiction, put the template number in the format of "T1:", "T2:", etc."""
prediction_requirements_prompt = PromptTemplate.from_template(prediction_requirements_template)

In [6]:
prediction_templates_template = """Here are some {prediction_domain} templates:

- {prediction_domain} template 1: On [ $p_t$ ], [ $p_p$ ] predicts that the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] [ $p_s$ ] by [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 2: In [ $p_t$ ], [ $p_p$ ] from [ $p_o$ ], predicts that the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] from [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 3: [ $p_p $] predicts on [ $p_t$ ] that the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] [ $p_s$ ] under [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 4: According to a [ $p_p$ ] from [ $p_o$ ], on [ $p_t$ ], the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] beyond [ $p_m$ ] in the timeframe of [ $p_f$ ].
- {prediction_domain} template 5: In [ $p_f$ ], the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] a [ $p_m$ ] [ $p_s$ ], as predicted by [ $p_p$ ] on [ $p_t$ ]."""
prediction_templates_prompt = PromptTemplate.from_template(prediction_templates_template)

In [7]:
prediction_examples_template = """Here are some examples of {prediction_domain} predictions:

{domain_examples}

With the above, generate a unique set of {predictions_N} predictions. Think from the perspective of an {prediction_domain} anlyst, expert, top executive, or senior level person."""
prediction_examples_prompt = PromptTemplate.from_template(prediction_examples_template)

In [8]:
prediction_input_prompts = [
    ("prediction_properties", prediction_properties_prompt),
    ("prediction_requirements", prediction_requirements_prompt),
    ("prediction_templates", prediction_templates_prompt),
    ("prediction_examples", prediction_examples_prompt),
]

pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prediction_prompt, pipeline_prompts=prediction_input_prompts
)

  pipeline_prompt = PipelinePromptTemplate(


## Generate Domain Predictions

In [9]:
predictions_N = 20

### Generate Financial Predictions

In [10]:
financial_attributes = """stock price, net profit, revenue, operating cash flow, research and development expenses, operating income, gross profit."""
financial_requirements = """- Should be based on real-world financial earnings reports.
    - Suppose the time when $p$ was made is during any earning season.
    - Include stocks from all sectors such as consumer staples, energy, finance, health care, industrials, materials, media, real estate, retail, technology, utilities, defense, etc.
    - Include the US Dollar sign ($) before or USD after the amount of the financial attribute."""

# For each template, have a rise, fall, or stable example, respectively.
financial_examples = """
- financial examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious, an investor] forecasts that the [revenue] at [Apple] [will likely] [decrease] from [$87B to $50 billion] in [2025 Q1].
    2. On [Tue, November 19, 2024], [Ava Lee] predicts that the [operating cash flow] at [ExxonMobil] [should] [decrease] by [5 percent to $20 billion] in [08/21/2025].
- financial examples for template 2:
    3. In [October 2024], [Julian Hall] from [Yahoo Finance], envisions that the [stock price] [will] [rise] from [$800 to $1,000 per share] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Mrs. Kalia] from [McDonald's], predicts that the [net profit] [will] [fall] under [5% to $5 billion] in [January of 2029].
- financial examples for template 3:
    5. [Dija Gabe, a financial expert] predicts on [23 October 2024] that the [research and development expenses] at [Alphabet] [may] [stay stable] at [$20 million] in [2027 Quarter 4].
    6. [Mr. Mike] predicts in [Q2 2026] that the [operating income] at [Microsoft] [will] [fall] by [407 percent to $50M] on [Monday, Nov 18, 2026].
- financial examples for template 4:
    7. According to a [top executive] from [Chevron], on [08/21/2024], the [net profit] [is expected to] [increase] beyond [10,000 USD] in the timeframe of [Q3 of 2029].
    8. According to [Brittany] from [Tesla], on [Fri, July 12, 2024], the [gross profit] [may] [increase] as much as [$30,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- financial examples for template 5:
    9. In [2025-08-21], the [net profit] at [Amazon] has a [probability] of [11 percent to reach $30k] [decrease], as predicted by [Emily Davis, a financial reporter] on [21 Oct 24].
    10. In [Quarter of 2027], the [revenue] at [Facebook] [is expected] to be [$30 billion, which is a 15%] [rise], as predicted by [a financial analyst] on [Sun, February 20, 2024]."""

In [11]:
financial_input_dict = {
    "prediction_domain": "financial",
    "prediction_domain_attribute": financial_attributes,
    "domain_requirements": financial_requirements,
    "domain_examples": financial_examples,
    "predictions_N": predictions_N
}
financial_prompt_output = pipeline_prompt.format(**financial_input_dict)
print(financial_prompt_output)

financial_df = base_pipeline.generate_predictions(financial_prompt_output, 1, "financial")
financial_df

A prediction ($p$) consists of the following nine properties:

    1. $p_p$, financial person that predicted $p$
        - Can be a person (with a name) or a financial person such as a financial reporter, financial analyst, financial expert, financial top executive, financial senior level person, etc).
    2. $p_o$, financial organization 
        - Can only be an organization or entity that is associated with the financial prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, financial prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the financial domain.
        - Some examples are stock price, net profi

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On 2024-10-15, Rachel Brown, a financial analyst, predicts that the operating cash flow at General Motors will likely decrease by $5 billion in Q2 of 2026.",1,llama-3.3-70b-versatile,financial
1,"T2: In 2024, Michael Davis from Goldman Sachs, forecasts that the stock price will rise from $50 to $75 per share in 2028.",1,llama-3.3-70b-versatile,financial
2,"T3: Emily Chen, a financial expert, predicts on 08/20/2024 that the research and development expenses at Pfizer may stay stable at $15 million in 2027.",1,llama-3.3-70b-versatile,financial
3,"T4: According to a senior executive from Boeing, on 21 Aug 2024, the net profit is expected to increase beyond $25 billion in the timeframe of Q4 of 2029.",1,llama-3.3-70b-versatile,financial
4,"T5: In 2025-08-20, the gross profit at Cisco Systems has a probability of 20 percent to reach $40 billion, which is a 10% increase, as predicted by David Lee, a financial reporter, on 10/10/2024.",1,llama-3.3-70b-versatile,financial
5,"T1: On Wednesday, November 20, 2024, Kevin White, an investor, predicts that the revenue at Visa will likely decrease by 5% to $20 billion in Q1 of 2026.",1,llama-3.3-70b-versatile,financial
6,"T2: In Q3 of 2024, Sophia Patel from JPMorgan Chase, envisions that the operating income will fall from $30 billion to $20 billion in 2027.",1,llama-3.3-70b-versatile,financial
7,"T3: Daniel Kim, a financial analyst, predicts on 2024/08/22 that the net profit at UnitedHealth Group may increase by 15% to $25 billion in 2028.",1,llama-3.3-70b-versatile,financial
8,"T4: According to a top executive from 3M, on 2024-08-25, the research and development expenses are expected to increase beyond $10 billion in the timeframe of Q2 of 2030.",1,llama-3.3-70b-versatile,financial
9,"T5: In 2026-08-22, the stock price at Intel has a probability of 15% to reach $60 per share, which is a 20% increase, as predicted by Olivia Brown, a financial expert, on 08/25/2024.",1,llama-3.3-70b-versatile,financial


### Generate Weather Predictions

In [12]:
weather_prediction_templates_template = """Here are some {prediction_domain} templates:

- {prediction_domain} template 1: On [ $p_t$ ], [ $p_p$ ] predicts that the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] [ $p_s$ ] by [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 2: In [ $p_t$ ], [ $p_p$ ] from [ $p_o$ ] in [ $p_l$ ], predicts that the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] from [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 3: [ $p_p $] predicts on [ $p_t$ ] that the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] [ $p_s$ ] under [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 4: According to a [ $p_p$ ] from [ $p_o$ ] in [ $p_l$ ], on [ $p_t$ ], the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] beyond [ $p_m$ ] in the timeframe of [ $p_f$ ].
- {prediction_domain} template 5: In [ $p_f$ ], the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] a [ $p_m$ ] [ $p_s$ ], as predicted by [ $p_p$ ] on [ $p_t$ ]."""
weather_prediction_templates_prompt = PromptTemplate.from_template(weather_prediction_templates_template)

In [13]:
weather_attributes = """temperature, precipitation, wind speed, humidity, etc."""
weather_requirements = """- Should be based on real-world weather reports.
    - Suppose the time when $p$ was made is during any season and any location (ie: Florida known for hurricanes, California known for wildfires, etc).
    - Include reports from all meteorologists, weather organizations, or any type of weather entity.."""

# For each template, have a rise, fall, or stable example, respectively.?
weather_examples = """
- weather examples for template 1:
    1. On [Monday, December 16, 2024], [Dr. Melissa Carter] a weather expert at the [National Weather Service], forecasts that the [temperature], in [New York City] [will likely] [decrease] from [5°C to 3°C] on [February 16, 2025 (Fri)].
    2. On [Tue, 19 November 2024], [Ethan James] at the [US Weather Center] predicts that the [precipitation levels], in [San Francisco] [are likely to] [increase] by [20%] in the timeframe of [08/21/2025].
- weather examples for template 2:
    3. In [October 2024], [Samantha Lin] from [NOAA], envisions that the [wind speed] [should] [decrease] by [15 mph] in [Chicago] by [Friday, March 22, 2025].
    4. In [8/15/2027], [Carlos Rivera] from [Weather.com] predicts that the [humidity] [will] [rise] by [30%] in [Miami] in [July of 2025].
- weather examples for template 3:
    5. [Amanda Green], a weather reporter from [Bureau of Meteorology]  predicts on [23 October 2024] that the [temperature] in [Seattle], [will] [fall] by [10°F] in [2025 Quarter 1].
    6. [Mr. Tommy Wu], from [US Weather Center] predicts in [Q2 2026] that [snowfall levels], in [Denver] [will likely] [increase] by [8 inches] in [Monday, Nov 18, 2026].
- weather examples for template 4:
    7. According to a [top executive] from [AccuWeather], on [12/21/2024], the [rainfall] in [Portland] [is expected to] [increase] beyond [10 percent] in the timeframe of [early 2025].
    8. According to [David Harper] from [Weather Underground, on [Fri, August 9, 2024], the [air quality index] in [Los Angeles] [is likely to] [improve] by [20%] in [21 Aug 2024].
- weather examples for template 5:
    9. In [2025-08-21], the [average temperature] in [Houston] has a [probability] of [5 percent to] [decrease], as predicted by [King, a weather reporter] from [Meteorological Department] on [21 Oct 24].
    10. In [Quarter of 2027], wind chill] in [Minneapolis] [is expected] to be [10°F, which is a 15%] [rise], as predicted by [a weather analyst named Ortiz] on [Sun, February 20, 2024]."""

In [14]:
weather_input_prompts = [
    ("prediction_properties", prediction_properties_prompt),
    ("prediction_requirements", prediction_requirements_prompt),
    ("prediction_templates", weather_prediction_templates_prompt),
    ("prediction_examples", prediction_examples_prompt),
]

weather_pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prediction_prompt, pipeline_prompts=weather_input_prompts
)

weather_input_dict = {
    "prediction_domain": "weather",
    "prediction_domain_attribute": weather_attributes,
    "domain_requirements": weather_requirements,
    "domain_examples": weather_examples,
    "predictions_N": predictions_N
}
weather_prompt_output = weather_pipeline_prompt.format(**weather_input_dict)
print(weather_prompt_output)

weather_df = base_pipeline.generate_predictions(weather_prompt_output, 1, "weather")
weather_df

A prediction ($p$) consists of the following nine properties:

    1. $p_p$, weather person that predicted $p$
        - Can be a person (with a name) or a weather person such as a weather reporter, weather analyst, weather expert, weather top executive, weather senior level person, etc).
    2. $p_o$, weather organization 
        - Can only be an organization or entity that is associated with the weather prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, weather prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the weather domain.
        - Some examples are temperature, precipitation, wind speed, humi

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On 2024-10-15, Dr. Rachel Kim, a senior meteorologist at the National Weather Service, forecasts that the precipitation levels at the Weather Center in Los Angeles will likely increase by 15% in Q2 of 2026.",1,llama-3.3-70b-versatile,weather
1,"T2: In 2024-08, Emily Chen from the Bureau of Meteorology in Sydney, Australia, predicts that the humidity will rise by 25% in Perth by 2025-03-01.",1,llama-3.3-70b-versatile,weather
2,"T3: Dr. Michael Brown, a weather expert from the UK Met Office, predicts on 2024-09-20 that the temperature in London will fall by 8°C in the winter of 2027.",1,llama-3.3-70b-versatile,weather
3,"T4: According to a top executive from the European Weather Center, on 2024-11-01, the wind speed in Berlin will be expected to decrease beyond 10 mph in the timeframe of 2026-06.",1,llama-3.3-70b-versatile,weather
4,"T5: In 2028-04, the average sea level pressure at the Japanese Meteorological Agency in Tokyo has a probability of 20% to decrease, as predicted by Dr. Sophia Patel, a weather analyst, on 2024-07-15.",1,llama-3.3-70b-versatile,weather
5,"T1: On 2024-12-01, James Davis, a weather reporter from the US Weather Center, forecasts that the snowfall levels in Denver will likely increase by 12 inches in Q1 of 2027.",1,llama-3.3-70b-versatile,weather
6,"T2: In 2025-01, David Lee from the Korean Meteorological Administration in Seoul, envisions that the air quality index will improve by 18% in Busan by 2026-09-01.",1,llama-3.3-70b-versatile,weather
7,"T3: Dr. Olivia Martin, a weather expert from the French Weather Service, predicts on 2024-10-25 that the precipitation levels in Paris will rise by 20% in the summer of 2028.",1,llama-3.3-70b-versatile,weather
8,"T4: According to a senior level person from the Indian Meteorological Department, on 2024-08-20, the temperature in Mumbai will be expected to increase beyond 35°C in the timeframe of 2027-05.",1,llama-3.3-70b-versatile,weather
9,"T5: In 2029-10, the wind chill at the Canadian Weather Center in Toronto has a probability of 15% to rise, as predicted by Dr. William White, a weather analyst, on 2024-06-01.",1,llama-3.3-70b-versatile,weather


### Generate Health Predictions

In [15]:
health_attributes = """obesity rates, prevalence of chronic illnesses, average physical activity levels, nutritional intake, etc."""
health_requirements = """- Should be based on real-world health reports.
    - Suppose the time when $p$ was made is during any season such as flu season, allergy season, pandemic, epidemic, etc.
    - Include reports from all Health organization, researcher, doctor, physical therapist, physician assistant, nurse practictioners, fitness expert, etc."""

# For each template, have a rise, fall, or stable example, respectively.?
health_examples = """
- health examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious, an investor] forecasts that the [obesity rate] at the [United States] [will likely] [decrease] by [5%] in [2025 Q1].
    2. On [Tue, November 19, 2024], [medical professional Sophia Rodriguez] predicts that the [cancer rate] in [Georgia] [should] [decrease] by [4 percent] in [08/21/2025].
- health examples for template 2:
    3. In [October 2024], [Arjun Patel, Ph.D] from [Florida Department of Health] envisions that the [average daily caloric intake] [may] [rise] from [100 to 300] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Dr. Michael Brown] from the [Centers for Disease Control and Prevention], foresee that the [average daily caloric intake] [will] [fall] [8 percent] in [2027]
- Examples for template 3: .
- health examples for template 3:
    5. [A trusted expert] predicts on [23 October 2024] that the [global vaccination rate for measles] in the [US] [should] [stay stable] at [100K people] in [2027 Quarter 4].
    6. [Dr. Sarah Johnson] foresee in [Q2 2026] that the [prevalence of hypertension] in [California] [will] [fall] by [407 percent] by [Monday, Nov 18, 2026].
- health examples for template 4:
    7. According to a [Olivia Martinez] from [Stanford University], on [08/21/2024], the prevalence of [type 2 diabetes in adults] [is expected to] [increase] beyond [8.5 percent] in the timeframe of [Q3 of 2029].
    8. According to [Rachel Kim, MD] from the [University of California], on [Fri, July 12, 2024], the prevalence of [type 2 diabetes in adults] [may] [increase] as much as [30,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- health examples for template 5:
    9. In [2025-08-21], the [average weekly exercise hours] in [United States] has a [probability] of [20 percent to reach 30k], as predicted by [Emily Davis, Harvard School of Public Health] on [21 Oct 24].
    10. In [Quarter of 2027], the [average weekly walking hours] in [Atlanta] is [expected to rise] by [15%], as predicted by the [Monique, National Institutes of Health] on [Sun, February 20, 2024]."""

In [16]:
health_input_dict = {
    "prediction_domain": "health",
    "prediction_domain_attribute": health_attributes,
    "domain_requirements": health_requirements,
    "domain_examples": health_examples,
    "predictions_N": predictions_N
}

health_prompt_output = pipeline_prompt.format(**health_input_dict)
print(health_prompt_output)

health_df = base_pipeline.generate_predictions(health_prompt_output, 1, "health")
health_df

A prediction ($p$) consists of the following nine properties:

    1. $p_p$, health person that predicted $p$
        - Can be a person (with a name) or a health person such as a health reporter, health analyst, health expert, health top executive, health senior level person, etc).
    2. $p_o$, health organization 
        - Can only be an organization or entity that is associated with the health prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, health prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the health domain.
        - Some examples are obesity rates, prevalence of chronic illnesses, average

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On 2024-10-15, Dr. David Lee, a health expert from the World Health Organization, predicts that the obesity rate at the United States will likely decrease by 3% in 2026 Q2.",1,llama-3.3-70b-versatile,health
1,"T2: In Q4 of 2024, Emily Chen, a researcher from the Centers for Disease Control and Prevention, predicts that the average daily caloric intake may rise from 2000 to 2500 in 2028.",1,llama-3.3-70b-versatile,health
2,"T3: Dr. Rachel Hall, a senior level person at the National Institutes of Health, predicts on 2024/08/20 that the global vaccination rate for influenza in the US should stay stable at 90% in 2027 Q1.",1,llama-3.3-70b-versatile,health
3,"T4: According to a health analyst, Michael Kim, from the University of California, on 2024-08-22, the prevalence of type 2 diabetes in adults is expected to increase beyond 9.5% in the timeframe of 2030 Q2.",1,llama-3.3-70b-versatile,health
4,"T5: In 2025-08-20, the average weekly exercise hours in Australia has a probability of 25% to reach 25k, as predicted by Dr. Sophia Patel, a health expert from the Australian Institute of Health, on 2024-10-18.",1,llama-3.3-70b-versatile,health
5,"T1: On 2024/11/12, Dr. James Davis, a health expert from the Harvard School of Public Health, predicts that the cancer rate at the United Kingdom will likely decrease by 2% in 2026 Q3.",1,llama-3.3-70b-versatile,health
6,"T2: In Q1 of 2025, Dr. Lisa Nguyen, a researcher from the Stanford University, envisions that the average daily physical activity levels may increase from 30 to 60 minutes in 2029 Q1.",1,llama-3.3-70b-versatile,health
7,"T3: Dr. Kevin White, a top executive at the American Heart Association, predicts on 2024-09-15 that the prevalence of hypertension in Canada should fall by 10% in 2028 Q2.",1,llama-3.3-70b-versatile,health
8,"T4: According to a health expert, Dr. Maria Rodriguez, from the University of Michigan, on 2024/10/10, the prevalence of mental health disorders in adults is expected to increase beyond 15% in the timeframe of 2030 Q3.",1,llama-3.3-70b-versatile,health
9,"T5: In 2026-08-15, the average weekly walking hours in Germany has a probability of 30% to reach 20k, as predicted by Dr. Thomas Brown, a health analyst from the German Institute of Health, on 2024-11-20.",1,llama-3.3-70b-versatile,health


### Generate Policy Predictions

In [17]:
policy_attributes = """election outcomes, economic reforms, legislative impacts."""
policy_requirements = """- Should be based on real-world policy reports.
    - Suppose the time when $p$ was made is during an election cycle or non-election cycles.
    - Include policies & laws, from all sectors such as consumer staples, energy, finance, health care, industrials, materials, media, real estate, retail, technology, utilities, defense, etc."""

policy_examples = """
- policy examples for template 1:
    1. On [Monday, December 16, 2024], [President John Doe] forecasts that the [unemployment rate] at [the United States] [will likely] [decrease] by [2%] in [2025 Q1].
    2. On [Tue, November 19, 2024], [Dr. Jane Smith] foresee that the [population growth rate] in [California] [is likely to] [decrease] by [5 percent to 20 billion] in [08/21/2025].
- policy examples for template 2:
    3. In [October 2024], [Senator Emily Johnson] from [the Senate Committee on Finance], envisions that the [inflation rate] [should] [rise] from [1.3 percent to 89 percen] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Governor Michael Brown] from [the State of Texas], predicts that the [number of registered voters] [will] [fall] under [5B] in [Dec of 2029].
- policy examples for template 3:
    5. [Dija Gabe in the Congressional Budget Office] predicts on [23 October 2024] that [national debt] in [USA] [may] [stay stable] at [20 million] in [2027 Quarter 4].
    6. [Dr. Sarah Lee] foresee in [Q2 2026] that the [median household income] in [NY] [should] [fall] by [629 percent to $15,000] on [Monday, Nov 18, 2026].
- policy examples for template 4:
    7. According to a [General Robert Williams] from [the Department of Defense], on [08/21/2024], the [number of active-duty soldiers] [is expected to] [increase] beyond [$10,000] in the timeframe of [Q3 of 2029].
    8. According to [Dr. Olivia Martinez] from [the Census Bureau], on [Fri, July 12, 2024], the [population density] in [urban areas] [is likely to] [increase] as much as [100,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- policy examples for template 5:
    9. In [2025-08-21], the [number of citizens] in [Thomson, GA, 30824] has a [probability] of [92 percent to reach 30k] [decrease], as predicted by [Shirly Tisdale, a policy reporter] on [21 Oct 24].
    10. In [Quarter of 2027], the  [number of Navy members] in [the United States] [is expected] to be [300K, which is a 15%] [rise], as predicted by [a policy analyst] on [Sun, February 20, 2024]."""

In [18]:
policy_input_dict = {
    "prediction_domain": "policy",
    "prediction_domain_attribute": policy_attributes,
    "domain_requirements": policy_requirements,
    "domain_examples": policy_examples,
    "predictions_N": predictions_N
}

policy_prompt_output = pipeline_prompt.format(**policy_input_dict)
print(policy_prompt_output)

policy_df = base_pipeline.generate_predictions(policy_prompt_output, 1, "policy")
policy_df

A prediction ($p$) consists of the following nine properties:

    1. $p_p$, policy person that predicted $p$
        - Can be a person (with a name) or a policy person such as a policy reporter, policy analyst, policy expert, policy top executive, policy senior level person, etc).
    2. $p_o$, policy organization 
        - Can only be an organization or entity that is associated with the policy prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, policy prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the policy domain.
        - Some examples are election outcomes, economic reforms, legislative impact

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On 2024-10-15, policy expert, Rachel Kim, predicts that the economic growth rate at the European Union will likely rise by 3.5% in Q2 of 2026.",1,llama-3.3-70b-versatile,policy
1,"T2: In 2024/08/20, Senator James Davis from the Senate Committee on Energy and Natural Resources, forecasts that the renewable energy consumption will should increase from 20% to 50% in 2028.",1,llama-3.3-70b-versatile,policy
2,"T3: Dr. David Lee, a policy analyst, predicts on 08/22/2024 that the healthcare expenditure in the United States may stay stable at $3.5 trillion in 2029.",1,llama-3.3-70b-versatile,policy
3,"T4: According to a policy reporter, Emily Chen, from the Congressional Budget Office, on 21 August 2024, the federal budget deficit is expected to decrease beyond $1 trillion in the timeframe of Q4 of 2027.",1,llama-3.3-70b-versatile,policy
4,"T5: In 2025-08-20, the number of electric vehicles in California has a probability of 85% to reach 5 million, which is a 25% increase, as predicted by Michael Brown, a policy expert, on 2024-10-18.",1,llama-3.3-70b-versatile,policy
5,"T1: On Wednesday, November 20, 2024, Governor Sarah Taylor predicts that the unemployment rate at the State of New York will likely decrease by 1.2% in Q1 of 2026.",1,llama-3.3-70b-versatile,policy
6,"T2: In Q3 of 2024, Dr. Kevin White from the Department of Labor, envisions that the job market growth will should rise from 2% to 5% in 2027.",1,llama-3.3-70b-versatile,policy
7,"T3: Policy analyst, Lisa Nguyen, predicts on 2024/09/15 that the consumer price index in the United Kingdom may fall by 0.5% in 2028.",1,llama-3.3-70b-versatile,policy
8,"T4: According to a senior level person, Mark Davis, from the Federal Reserve, on 2024-08-25, the interest rate is expected to increase beyond 3.5% in the timeframe of Q2 of 2029.",1,llama-3.3-70b-versatile,policy
9,"T5: In 2026-08-22, the number of students in higher education in Australia has a probability of 90% to reach 3.5 million, which is a 10% increase, as predicted by Dr. Sophia Patel, a policy expert, on 2024-10-12.",1,llama-3.3-70b-versatile,policy


### Combine Domain Predictions

In [19]:
predictions_df = DataProcessing.concat_dfs([financial_df, weather_df, health_df, policy_df])
predictions_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On 2024-10-15, Rachel Brown, a financial analyst, predicts that the operating cash flow at General Motors will likely decrease by $5 billion in Q2 of 2026.",1,llama-3.3-70b-versatile,financial
1,"T2: In 2024, Michael Davis from Goldman Sachs, forecasts that the stock price will rise from $50 to $75 per share in 2028.",1,llama-3.3-70b-versatile,financial
2,"T3: Emily Chen, a financial expert, predicts on 08/20/2024 that the research and development expenses at Pfizer may stay stable at $15 million in 2027.",1,llama-3.3-70b-versatile,financial
3,"T4: According to a senior executive from Boeing, on 21 Aug 2024, the net profit is expected to increase beyond $25 billion in the timeframe of Q4 of 2029.",1,llama-3.3-70b-versatile,financial
4,"T5: In 2025-08-20, the gross profit at Cisco Systems has a probability of 20 percent to reach $40 billion, which is a 10% increase, as predicted by David Lee, a financial reporter, on 10/10/2024.",1,llama-3.3-70b-versatile,financial
...,...,...,...,...
75,"T1: On 2024-10-25, Governor Michael Brown predicts that the economic growth rate at the State of Texas will likely rise by 2.5% in Q3 of 2026.",1,llama-3.3-70b-versatile,policy
76,"T2: In Q2 of 2024, Dr. Sophia Patel from the Department of Commerce, envisions that the international trade will should increase from 10% to 20% in 2027.",1,llama-3.3-70b-versatile,policy
77,"T3: Policy analyst, Mark Davis, predicts on 2024/09/20 that the healthcare expenditure in the United Kingdom may fall by 1% in 2028.",1,llama-3.3-70b-versatile,policy
78,"T4: According to a senior level person, James Wilson, from the Federal Reserve, on 2024-08-30, the inflation rate is expected to decrease beyond 2.5% in the timeframe of Q4 of 2029.",1,llama-3.3-70b-versatile,policy


In [20]:
updated_predictions_df = DataProcessing.reformat_df_with_template_number(predictions_df, col_name="Base Sentence")
updated_predictions_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain,Template Number
0,"On 2024-10-15, Rachel Brown, a financial analyst, predicts that the operating cash flow at General Motors will likely decrease by $5 billion in Q2 of 2026.",1,llama-3.3-70b-versatile,financial,1
1,"In 2024, Michael Davis from Goldman Sachs, forecasts that the stock price will rise from $50 to $75 per share in 2028.",1,llama-3.3-70b-versatile,financial,2
2,"Emily Chen, a financial expert, predicts on 08/20/2024 that the research and development expenses at Pfizer may stay stable at $15 million in 2027.",1,llama-3.3-70b-versatile,financial,3
3,"According to a senior executive from Boeing, on 21 Aug 2024, the net profit is expected to increase beyond $25 billion in the timeframe of Q4 of 2029.",1,llama-3.3-70b-versatile,financial,4
4,"In 2025-08-20, the gross profit at Cisco Systems has a probability of 20 percent to reach $40 billion, which is a 10% increase, as predicted by David Lee, a financial reporter, on 10/10/2024.",1,llama-3.3-70b-versatile,financial,5
...,...,...,...,...,...
75,"On 2024-10-25, Governor Michael Brown predicts that the economic growth rate at the State of Texas will likely rise by 2.5% in Q3 of 2026.",1,llama-3.3-70b-versatile,policy,1
76,"In Q2 of 2024, Dr. Sophia Patel from the Department of Commerce, envisions that the international trade will should increase from 10% to 20% in 2027.",1,llama-3.3-70b-versatile,policy,2
77,"Policy analyst, Mark Davis, predicts on 2024/09/20 that the healthcare expenditure in the United Kingdom may fall by 1% in 2028.",1,llama-3.3-70b-versatile,policy,3
78,"According to a senior level person, James Wilson, from the Federal Reserve, on 2024-08-30, the inflation rate is expected to decrease beyond 2.5% in the timeframe of Q4 of 2029.",1,llama-3.3-70b-versatile,policy,4


## LangChain Templates for Any Domain Non-Predictions

In [21]:
full_non_prediction_template = """{non_prediction_properties}

{non_prediction_requirements}

{non_prediction_examples}"""
full_non_prediction_prompt = PromptTemplate.from_template(full_non_prediction_template)

In [22]:
non_prediction_properties_template = """Generate any sentence that's not a prediction, which we name non-prediction. A prediction is below with variables
    1. $p_p$, {prediction_domain} person that predicted $p$
        - Can be a person (with a name) or a {prediction_domain} person such as a {prediction_domain} reporter, {prediction_domain} analyst, {prediction_domain} expert, {prediction_domain} top executive, {prediction_domain} senior level person, etc).
    2. $p_o$, {prediction_domain} organization 
        - Can only be an organization or entity that is associated with the {prediction_domain} prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, {prediction_domain} prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the {prediction_domain} domain.
    6. $p_s$, slope that indicates the direction of change in $p_a$
        - Change of directions can be rise/increase/as much as, fall/decrease/as little as, change, stay stable, high/low chance/probability/degree of, etc.
    7. $p_m$, metric outcome
        - How much will the $p_a$ $p_s$?
    8. $p_v$, future verb tense
        - A verb that is associated with the future such as will, would, be going to, should, etc.
    9. $p_l$, location
        - The location is attached to attribute $p_a$ if {prediction_domain} == 'weather'
    """
non_prediction_properties_prompt = PromptTemplate.from_template(non_prediction_properties_template)

In [23]:
non_prediction_requirements = """ requirements to use for each non-prediction:

    - Should be based on real-world {prediction_domain} data and not hallucinate.
    - Should be a simple sentence (non-prediction) (and NOT compounding using "and" or "or").
    - The prediction should be unique and not repeated.
    - Do not number the non-predictions.
    - Do not say, "Here are 10 unique non-predictions based on the provided templates and examples:" in the prompt.
    - Do not use any of the examples in the prompt.
    - In front of every non-prodiction, put the template number in the format of "T0:" and only use "T0:" as the template number.
    - Should be between 10 to 30 words."""
non_prediction_requirements_prompt = PromptTemplate.from_template(non_prediction_requirements)

In [24]:
non_prediction_examples_template = """Here are some examples of {prediction_domain} non-predictions:

{domain_examples}

With the above, generate a unique set of {non_predictions_N} non-predictions. Think from the perspective of an {prediction_domain} person."""
non_prediction_examples_prompt = PromptTemplate.from_template(non_prediction_examples_template)

In [25]:
non_prediction_input_prompts = [
    ("non_prediction_properties", non_prediction_properties_prompt),
    ("non_prediction_requirements", non_prediction_requirements_prompt),
    ("non_prediction_examples", non_prediction_examples_prompt),
]

non_prediction_pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_non_prediction_prompt, pipeline_prompts=non_prediction_input_prompts
)

## Generate Non-Predictions

- Model isn't generating the specified amount, so will loop for amount wanted.

In [26]:
non_predictions_N = 80

In [27]:
non_prediction_attributes = """Any sentence that does not include prediction variables such as $p$, $p_s$, $p_t$, $p_f$, $p_a$, $p_m$, $p_v$."""

non_prediction_examples = """
- non-prediction examples for template 0:
    1. The cat sat on the mat and looked out the window.
    2. She enjoys reading books on a rainy afternoon.
    3. The quick brown fox jumps over the lazy dog.
    4. He likes to play basketball with his friends on weekends.
    5. The sun sets in the west, painting the sky with hues of orange.
    6. They went for a hike in the mountains and enjoyed the view.
    7. The coffee shop on the corner serves the best lattes in town.
    8. She baked a cake for her friend's birthday party.
    9. The children played in the park until it got dark.
    10. He wrote a letter to his grandmother, telling her about his new job."""


In [28]:
non_predictions_input_dict = {
    "prediction_domain": "any",
    "any_non_prediction_domain_attribute": non_prediction_attributes,
    "domain_examples": non_prediction_examples,
    "non_predictions_N": non_predictions_N
}

non_prediction_prompt_output = non_prediction_pipeline_prompt.format(**non_predictions_input_dict)
print(non_prediction_prompt_output)

non_prediction_label = 0
non_predictions_df = base_pipeline.generate_predictions(text=non_prediction_prompt_output, label=non_prediction_label, domain="any")
non_predictions_df

Generate any sentence that's not a prediction, which we name non-prediction. A prediction is below with variables
    1. $p_p$, any person that predicted $p$
        - Can be a person (with a name) or a any person such as a any reporter, any analyst, any expert, any top executive, any senior level person, etc).
    2. $p_o$, any organization 
        - Can only be an organization or entity that is associated with the any prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, any prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the any domain.
    6. $p_s$, slope that indicates the direction of change in $p_

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,T0: The dog ran quickly around the corner of the house.,0,llama-3.3-70b-versatile,any
1,T0: She ate a sandwich for lunch at her desk.,0,llama-3.3-70b-versatile,any
2,T0: The baby laughed at the silly clown.,0,llama-3.3-70b-versatile,any
3,T0: He played guitar in a band on Friday nights.,0,llama-3.3-70b-versatile,any
4,T0: The flowers bloomed in the garden after spring rain.,0,llama-3.3-70b-versatile,any
...,...,...,...,...
86,T0: The dog ran around the corner of the house.,0,llama-3.3-70b-versatile,any
87,T0: She drank coffee every morning to wake up.,0,llama-3.3-70b-versatile,any
88,T0: The kids played with playdough in the classroom.,0,llama-3.3-70b-versatile,any
89,T0: The woman read a magazine on her tablet.,0,llama-3.3-70b-versatile,any


In [29]:
updated_non_predictions_df = DataProcessing.reformat_df_with_template_number(non_predictions_df, col_name="Base Sentence")
updated_non_predictions_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain,Template Number
0,The dog ran quickly around the corner of the house.,0,llama-3.3-70b-versatile,any,0
1,She ate a sandwich for lunch at her desk.,0,llama-3.3-70b-versatile,any,0
2,The baby laughed at the silly clown.,0,llama-3.3-70b-versatile,any,0
3,He played guitar in a band on Friday nights.,0,llama-3.3-70b-versatile,any,0
4,The flowers bloomed in the garden after spring rain.,0,llama-3.3-70b-versatile,any,0
...,...,...,...,...,...
86,The dog ran around the corner of the house.,0,llama-3.3-70b-versatile,any,0
87,She drank coffee every morning to wake up.,0,llama-3.3-70b-versatile,any,0
88,The kids played with playdough in the classroom.,0,llama-3.3-70b-versatile,any,0
89,The woman read a magazine on her tablet.,0,llama-3.3-70b-versatile,any,0


In [30]:


# non_predictions_dfs = []
# for i in range(4):
#     non_predictions_df = base_pipeline.generate_predictions(text=non_prediction_template, label=non_prediction_label, domain="any")
#     non_predictions_dfs.extend([non_predictions_df])

# non_predictions_df = data_processing.concat_dfs(non_predictions_dfs, ignore_index=True)
# non_predictions_df

## Store Predictions and Non-Predictions

In [31]:
%store updated_predictions_df
%store updated_non_predictions_df

Stored 'updated_predictions_df' (DataFrame)
Stored 'updated_non_predictions_df' (DataFrame)


## Further Explaination

- Need to optimize code
- Include multiple models next

### Multiple Models to Generate Predictions

In [32]:
# domains = ["financial", "weather", "health care"]
# llama_model = LlamaTextGenerationModel()
# model_2 = SomeModel()

# # Should all model dfs be a list of dataframes? Then we can concatenate them all together with pd.concat()
# for domain in domains:
#     llama_df = llama_model.generate_predictions(financial_prompt_output, 1, domain)
#     model_2_df = model_2.generate_predictions(financial_prompt_output, 1, domain)


# # or 

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "financial")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "financial")

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "weather")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "weather")

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "health care")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "health care")