# 1-Generate Predictions using LangChain

- **Goal:** Prediction Recognition

- **Purpose:** To implement step 1 with sub steps of prediction recognition pipeline. See steps
    1. Generate predictions
        1. Create several prediction prompts templates
        2. Utilize open-source LLMs to generate predictions

- **Misc:**
    - `%store`: Cell magic will store the variable of interest so we can load in another notebook

In [1]:
import os
import sys

import pandas as pd

from langchain_core.prompts import PipelinePromptTemplate, PromptTemplate

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from pipelines import BasePipeline
from data_processing import DataProcessing

In [2]:
pd.set_option('max_colwidth', 800)
base_pipeline = BasePipeline()

## LangChain Templates

In [3]:
full_prediction_template = """{prediction_properties}

{prediction_requirements}

{prediction_templates}

{prediction_examples}"""
full_prediction_prompt = PromptTemplate.from_template(full_prediction_template)

In [4]:
prediction_properties_template = """A prediction ($y$) consists of the following eight properties:

    1. $y_p$, {prediction_domain} person that predicted $y$
        - Can be a person (with a name) or a {prediction_domain} person such as a {prediction_domain} reporter, {prediction_domain} analyst, {prediction_domain} expert, {prediction_domain} top executive, {prediction_domain} senior level person, etc).
    2. $y_o$, {prediction_domain} organization 
        - Can only be an organization or entity that is associated with the {prediction_domain} prediction.
    3. $y_t$, current time when $y$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $y_f$, forecast time when $y$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $y_a$, {prediction_domain} prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the {prediction_domain} domain.
        - Some examples are {prediction_domain_attribute}.
    6. $y_s$, slope that indicates the direction of change in $y_a$
        - Change of directions can be rise/increase/as much as, fall/decrease/as little as, change, stay stable, high/low chance/probability/degree of.
    7. $y_m$, metric outcome
        - How much will the $y_a$ $y_s$?
    8. $y_v$, future verb tense
        - A verb that is associated with the future such as will, would, be going to, should, etc.
    9. $y_l$, location
        - The location is attached to attribute $y_a$ if {prediction_domain} == 'weather'
    """
prediction_properties_prompt = PromptTemplate.from_template(prediction_properties_template)

In [5]:
prediction_requirements_template = """{prediction_domain} requirements to use for each prediction:

    - Should be based on real-world {prediction_domain} data and not hallucinate.
    - Only a simple sentence (prediction) (and NOT compounding using "and" or "or").
    - Should diversify all eight properties of the prediction ($y$).
    - Should use synonyms of predicts such as forecasts, speculates, foresee, envision, etc., and not use any of them more than ten times.
    - The prediction should be unique and not repeated.
    - The forecast time ($y_f$) should always be after current time ($y_t$) of when forecast ($y$) was made.
    - Do not number the predictions.
    - Do not say, "As the {prediction_domain} at organization ($y_o$), I will generate company-based {prediction_domain} predictions using the provided templates." or anything similar.
    - Should have a forecast time ($y_f$) when $y$ is expected to come to fruition between 2025 to 2050.
    - Use the five different templates and examples provided.
    - Change how the current time ($y_t$) and forecast time ($y_f$) are written in the prediction with examples of (1) Wednesday, August 21, 2024; (2) Wed, August 21, 2024; (3) 08/21/2024; (4) 08/21/2024; (5) 21/08/2024; (6) 21 August 2024; (7) 2024/08/21; (8) 2024-08-21; (9) August 21, 2024; (10) Aug 21, 2024; (11) 21 August 2024, (12) 21 Aug 2024, Q3 of 2027, 2029 of Q3, etc (with removing day of week).
    {domain_requirements}
    - Do not say, "Here are 10 unique weather predictions based on the provided templates and examples:" in the prompt."""
prediction_requirements_prompt = PromptTemplate.from_template(prediction_requirements_template)

In [6]:
prediction_templates_template = """Here are some {prediction_domain} templates:

- {prediction_domain} template 1: On [ $y_t$ ], [ $y_p$ ] predicts that the [ $y_a$ ] at [ $y_o$ ] [ $y_v$ ] [ $y_s$ ] by [ $y_m$ ] in [ $y_f$ ].
- {prediction_domain} template 2: In [ $y_t$ ], [ $y_p$ ] from [ $y_o$ ], predicts that the [ $y_a$ ] [ $y_v$ ] [ $y_s$ ] from [ $y_m$ ] in [ $y_f$ ].
- {prediction_domain} template 3: [ $y_p $] predicts on [ $y_t$ ] that the [ $y_a$ ] at [ $y_o$ ] [ $y_v$ ] [ $y_s$ ] under [ $y_m$ ] in [ $y_f$ ].
- {prediction_domain} template 4: According to a [ $y_p$ ] from [ $y_o$ ], on [ $y_t$ ], the [ $y_a$ ] [ $y_v$ ] [ $y_s$ ] beyond [ $y_m$ ] in the timeframe of [ $y_f$ ].
- {prediction_domain} template 5: In [ $y_f$ ], the [ $y_a$ ] at [ $y_o$ ] [ $y_v$ ] a [ $y_m$ ] [ $y_s$ ], as predicted by [ $y_p$ ] on [ $y_t$ ]."""
prediction_templates_prompt = PromptTemplate.from_template(prediction_templates_template)

In [7]:
prediction_examples_template = """Here are some examples of {prediction_domain} predictions:

{domain_examples}


With the above, generate a unique set of {predictions_N} financial predictions. Think from the perspective of an {prediction_domain} anlyst, expert, top executive, or senior level person."""
prediction_examples_prompt = PromptTemplate.from_template(prediction_examples_template)

In [8]:
input_prompts = [
    ("prediction_properties", prediction_properties_prompt),
    ("prediction_requirements", prediction_requirements_prompt),
    ("prediction_templates", prediction_templates_prompt),
    ("prediction_examples", prediction_examples_prompt),
]

pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prediction_prompt, pipeline_prompts=input_prompts
)

  pipeline_prompt = PipelinePromptTemplate(


## Generate Domain Predictions

### Generate Financial Predictions

In [9]:
financial_attributes = """stock price, net profit, revenue, operating cash flow, research and development expenses, operating income, gross profit."""
financial_requirements = """- Should be based on real-world financial earnings reports.
    - Suppose the time when $y$ was made is during any earning season.
    - Include stocks from all sectors such as consumer staples, energy, finance, health care, industrials, materials, media, real estate, retail, technology, utilities, defense, etc."""

# For each template, have a rise, fall, or stable example, respectively.
financial_examples = """
- financial examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious, an investor] forecasts that the [revenue] at [Apple] [will likely] [decrease] from [$87B to $50 billion] in [2025 Q1].
    2. On [Tue, November 19, 2024], [Ava Lee] predicts that the [operating cash flow] at [ExxonMobil] [should] [decrease] by [5 percent to $20 billion] in [08/21/2025].
- financial examples for template 2:
    3. In [October 2024], [Julian Hall] from [Yahoo Finance], envisions that the [stock price] [will] [rise] from [$800 to $1,000 per share] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Mrs. Kalia] from [McDonald's], predicts that the [net profit] [will] [fall] under [5% to $5 billion] in [January of 2029].
- financial examples for template 3:
    5. [Dija Gabe, a financial expert] predicts on [23 October 2024] that the [research and development expenses] at [Alphabet] [may] [stay stable] at [$20 million] in [2027 Quarter 4].
    6. [Mr. Mike] predicts in [Q2 2026] that the [operating income] at [Microsoft] [will] [fall] by [407 percent to $50M] on [Monday, Nov 18, 2026].
- financial examples for template 4:
    7. According to a [top executive] from [Chevron], on [08/21/2024], the [net profit] [is expected to] [increase] beyond [$10,000] in the timeframe of [Q3 of 2029].
    8. According to [Brittany] from [Tesla], on [Fri, July 12, 2024], the [gross profit] [may] [increase] as much as [$30,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- financial examples for template 5:
    9. In [2025-08-21], the [net profit] at [Amazon] has a [probability] of [11 percent to reach $30k] [decrease], as predicted by [Emily Davis, a financial reporter] on [21 Oct 24].
    10. In [Quarter of 2027], the [revenue] at [Facebook] [is expected] to be [$30 billion, which is a 15%] [rise], as predicted by [a financial analyst] on [Sun, February 20, 2024]."""

In [10]:
financial_input_dict = {
    "prediction_domain": "financial",
    "prediction_domain_attribute": financial_attributes,
    "domain_requirements": financial_requirements,
    "domain_examples": financial_examples,
    "predictions_N": 10
}
financial_prompt_output = pipeline_prompt.format(**financial_input_dict)
print(financial_prompt_output)

financial_df = base_pipeline.generate_predictions(financial_prompt_output, 1, "financial")
financial_df

A prediction ($y$) consists of the following eight properties:

    1. $y_p$, financial person that predicted $y$
        - Can be a person (with a name) or a financial person such as a financial reporter, financial analyst, financial expert, financial top executive, financial senior level person, etc).
    2. $y_o$, financial organization 
        - Can only be an organization or entity that is associated with the financial prediction.
    3. $y_t$, current time when $y$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $y_f$, forecast time when $y$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $y_a$, financial prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the financial domain.
        - Some examples are stock price, net prof

Unnamed: 0,Base Sentences,Prediction Label,Model Name,Domain
0,"On 2024-10-15, Rachel Patel, a financial analyst, predicts that the operating income at Johnson & Johnson will likely increase by 10 percent to $20 billion in Q2 of 2026.",1,llama-3.3-70b-versatile,financial
1,"In 2024/08/20, Michael Chen from Goldman Sachs envisions that the stock price will rise from $500 to $700 per share in 2028.",1,llama-3.3-70b-versatile,financial
2,"According to a senior executive from Boeing, on 21 August 2024, the revenue is expected to increase beyond $100 billion in the timeframe of Q4 of 2029.",1,llama-3.3-70b-versatile,financial
3,"On 08/22/2024, David Lee, a financial expert, predicts that the research and development expenses at Netflix may stay stable at $1.5 billion in 2027.",1,llama-3.3-70b-versatile,financial
4,"In Q3 of 2025, the net profit at Visa will likely be $10 billion, which is a 5 percent increase, as predicted by a financial reporter on 2024-10-18.",1,llama-3.3-70b-versatile,financial
5,"On 2024/10/12, Emily Wong, a financial senior level person, predicts that the gross profit at Procter & Gamble will decrease by 2 percent to $15 billion in 2026.",1,llama-3.3-70b-versatile,financial
6,"In 2024-10-25, the operating cash flow at Coca-Cola is expected to increase by 15 percent to $12 billion, as predicted by a financial top executive on 2024/08/25.",1,llama-3.3-70b-versatile,financial
7,"On 21 Oct 2024, Kevin White, a financial analyst, predicts that the revenue at AT&T will likely decrease by 8 percent to $150 billion in Q1 of 2027.",1,llama-3.3-70b-versatile,financial
8,"According to a financial expert from Intel, on 2024/10/10, the stock price may rise as much as $100 per share, reflecting a 20 percent increase, in the timeframe of Q2 of 2028.",1,llama-3.3-70b-versatile,financial
9,"In 2029-08-20, the operating income at 3M will likely be $5 billion, which is a 10 percent decrease, as predicted by a financial senior level person on 2024/10/15.",1,llama-3.3-70b-versatile,financial


### Generate Weather Predictions

In [11]:
weather_prediction_templates_template = """Here are some {prediction_domain} templates:

- {prediction_domain} template 1: On [ $y_t$ ], [ $y_p$ ] predicts that the [ $y_a$ ] at [ $y_o$ ] in [ $y_l$ ] [ $y_v$ ] [ $y_s$ ] by [ $y_m$ ] in [ $y_f$ ].
- {prediction_domain} template 2: In [ $y_t$ ], [ $y_p$ ] from [ $y_o$ ] in [ $y_l$ ], predicts that the [ $y_a$ ] [ $y_v$ ] [ $y_s$ ] from [ $y_m$ ] in [ $y_f$ ].
- {prediction_domain} template 3: [ $y_p $] predicts on [ $y_t$ ] that the [ $y_a$ ] at [ $y_o$ ] in [ $y_l$ ] [ $y_v$ ] [ $y_s$ ] under [ $y_m$ ] in [ $y_f$ ].
- {prediction_domain} template 4: According to a [ $y_p$ ] from [ $y_o$ ] in [ $y_l$ ], on [ $y_t$ ], the [ $y_a$ ] [ $y_v$ ] [ $y_s$ ] beyond [ $y_m$ ] in the timeframe of [ $y_f$ ].
- {prediction_domain} template 5: In [ $y_f$ ], the [ $y_a$ ] at [ $y_o$ ] in [ $y_l$ ] [ $y_v$ ] a [ $y_m$ ] [ $y_s$ ], as predicted by [ $y_p$ ] on [ $y_t$ ]."""
weather_prediction_templates_prompt = PromptTemplate.from_template(weather_prediction_templates_template)

In [12]:
weather_attributes = """temperature, precipitation, wind speed, humidity, etc."""
weather_requirements = """- Should be based on real-world weather reports.
    - Suppose the time when $y$ was made is during any season and any location (ie: Florida known for hurricanes, California known for wildfires, etc).
    - Include reports from all meteorologists, weather organizations, or any type of weather entity.."""

# For each template, have a rise, fall, or stable example, respectively.?
weather_examples = """
- weather examples for template 1:
    1. On [Monday, December 16, 2024], [Dr. Melissa Carter] a weather expert at the [National Weather Service], forecasts that the [temperature], in [New York City] [will likely] [decrease] from [5°C to 3°C] on [February 16, 2025 (Fri)].
    2. On [Tue, 19 November 2024], [Ethan James] at the [US Weather Center] predicts that the [precipitation levels], in [San Francisco] [are likely to] [increase] by [20%] in the timeframe of [08/21/2025].
- weather examples for template 2:
    3. In [October 2024], [Samantha Lin] from [NOAA], envisions that the [wind speed] [should] [decrease] by [15 mph] in [Chicago] by [Friday, March 22, 2025].
    4. In [8/15/2027], [Carlos Rivera] from [Weather.com] predicts that the [humidity] [will] [rise] by [30%] in [Miami] in [July of 2025].
- weather examples for template 3:
    5. [Amanda Green], a weather reporter from [Bureau of Meteorology]  predicts on [23 October 2024] that the [temperature] in [Seattle], [will] [fall] by [10°F] in [2025 Quarter 1].
    6. [Mr. Tommy Wu], from [US Weather Center] predicts in [Q2 2026] that [snowfall levels], in [Denver] [will likely] [increase] by [8 inches] in [Monday, Nov 18, 2026].
- weather examples for template 4:
    7. According to a [top executive] from [AccuWeather], on [12/21/2024], the [rainfall] in [Portland] [is expected to] [increase] beyond [10 percent] in the timeframe of [early 2025].
    8. According to [David Harper] from [Weather Underground, on [Fri, August 9, 2024], the [air quality index] in [Los Angeles] [is likely to] [improve] by [20%] in [21 Aug 2024].
- weather examples for template 5:
    9. In [2025-08-21], the [average temperature] in [Houston] has a [probability] of [5 percent to] [decrease], as predicted by [King, a weather reporter] from [Meteorological Department] on [21 Oct 24].
    10. In [Quarter of 2027], wind chill] in [Minneapolis] [is expected] to be [10°F, which is a 15%] [rise], as predicted by [a weather analyst named Ortiz] on [Sun, February 20, 2024]."""

In [13]:
weather_input_prompts = [
    ("prediction_properties", prediction_properties_prompt),
    ("prediction_requirements", prediction_requirements_prompt),
    ("prediction_templates", weather_prediction_templates_prompt),
    ("prediction_examples", prediction_examples_prompt),
]

weather_pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prediction_prompt, pipeline_prompts=weather_input_prompts
)

weather_input_dict = {
    "prediction_domain": "weather",
    "prediction_domain_attribute": weather_attributes,
    "domain_requirements": weather_requirements,
    "domain_examples": weather_examples,
    "predictions_N": 10
}
weather_prompt_output = weather_pipeline_prompt.format(**weather_input_dict)
print(weather_prompt_output)

weather_df = base_pipeline.generate_predictions(weather_prompt_output, 1, "weather")
weather_df

A prediction ($y$) consists of the following eight properties:

    1. $y_p$, weather person that predicted $y$
        - Can be a person (with a name) or a weather person such as a weather reporter, weather analyst, weather expert, weather top executive, weather senior level person, etc).
    2. $y_o$, weather organization 
        - Can only be an organization or entity that is associated with the weather prediction.
    3. $y_t$, current time when $y$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $y_f$, forecast time when $y$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $y_a$, weather prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the weather domain.
        - Some examples are temperature, precipitation, wind speed, hum

Unnamed: 0,Base Sentences,Prediction Label,Model Name,Domain
0,"On 2024-10-15, Dr. Rachel Kim, a senior meteorologist at the National Weather Service, forecasts that the precipitation levels in Los Angeles will likely increase by 15% in Q2 of 2026.",1,llama-3.3-70b-versatile,weather
1,"In 2024/08/20, Michael Davis from the US Weather Center predicts that the humidity in New York City will rise by 25% in 2027-03-15.",1,llama-3.3-70b-versatile,weather
2,"Amanda Taylor, a weather expert from the Bureau of Meteorology, predicts on 2024-11-12 that the temperature in Chicago will fall by 12°F in the first quarter of 2028.",1,llama-3.3-70b-versatile,weather
3,"According to a weather analyst from AccuWeather, on 2024-09-18, the wind speed in Miami will decrease beyond 10 mph in the timeframe of 2025-06-01.",1,llama-3.3-70b-versatile,weather
4,"In 2029-09-20, the average temperature in Seattle has a probability of 10% to decrease, as predicted by James, a weather reporter from the Meteorological Department, on 2024-10-22.",1,llama-3.3-70b-versatile,weather
5,"On 2024-08-25, Emily Chen, a weather top executive at Weather.com, predicts that the snowfall levels in Denver will likely increase by 10 inches in 2026-12-01.",1,llama-3.3-70b-versatile,weather
6,"In Q4 of 2025, the rainfall in Portland is expected to increase by 20%, as predicted by David Lee, a weather senior level person from the National Oceanic and Atmospheric Administration, on 2024-09-15.",1,llama-3.3-70b-versatile,weather
7,"On 2024/11/10, Dr. Sophia Patel, a weather expert from the US Weather Center, forecasts that the air quality index in Los Angeles will improve by 15% in 2027-05-01.",1,llama-3.3-70b-versatile,weather
8,"According to a weather reporter from the Bureau of Meteorology, on 2024-10-28, the temperature in Houston will rise beyond 85°F in the timeframe of 2026-08-15.",1,llama-3.3-70b-versatile,weather
9,"In 2028-03-01, the wind chill in Minneapolis is expected to be 5°F, which is a 10% decrease, as predicted by Mark, a weather analyst from the National Weather Service, on 2024-09-22.",1,llama-3.3-70b-versatile,weather


### Generate Health Predictions

In [14]:
health_attributes = """obesity rates, prevalence of chronic illnesses, average physical activity levels, nutritional intake, etc."""
health_requirements = """- Should be based on real-world health reports.
    - Suppose the time when $y$ was made is during any season such as flu season, allergy season, pandemic, epidemic, etc.
    - Include reports from all Health organization, researcher, doctor, physical therapist, physician assistant, nurse practictioners, fitness expert, etc."""

# For each template, have a rise, fall, or stable example, respectively.?
health_examples = """
- health examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious, an investor] forecasts that the [obesity rate] at the [United States] [will likely] [decrease] by [5%] in [2025 Q1].
    2. On [Tue, November 19, 2024], [medical professional Sophia Rodriguez] predicts that the [cancer rate] in [Georgia] [should] [decrease] by [4 percent] in [08/21/2025].
- health examples for template 2:
    3. In [October 2024], [Arjun Patel, Ph.D] from [Florida Department of Health] envisions that the [average daily caloric intake] [may] [rise] from [100 to 300] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Dr. Michael Brown] from the [Centers for Disease Control and Prevention], foresee that the [average daily caloric intake] [will] [fall] [8 percent] in [2027]
- Examples for template 3: .
- health examples for template 3:
    5. [A trusted expert] predicts on [23 October 2024] that the [global vaccination rate for measles] in the [US] [should] [stay stable] at [100K people] in [2027 Quarter 4].
    6. [Dr. Sarah Johnson] foresee in [Q2 2026] that the [prevalence of hypertension] in [California] [will] [fall] by [407 percent] by [Monday, Nov 18, 2026].
- health examples for template 4:
    7. According to a [Olivia Martinez] from [Stanford University], on [08/21/2024], the prevalence of [type 2 diabetes in adults] [is expected to] [increase] beyond [8.5 percent] in the timeframe of [Q3 of 2029].
    8. According to [Rachel Kim, MD] from the [University of California], on [Fri, July 12, 2024], the prevalence of [type 2 diabetes in adults] [may] [increase] as much as [30,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- health examples for template 5:
    9. In [2025-08-21], the [average weekly exercise hours] in [United States] has a [probability] of [20 percent to reach 30k], as predicted by [Emily Davis, Harvard School of Public Health] on [21 Oct 24].
    10. In [Quarter of 2027], the [average weekly walking hours] in [Atlanta] is [expected to rise] by [15%], as predicted by the [Monique, National Institutes of Health] on [Sun, February 20, 2024]."""

In [15]:
health_input_dict = {
    "prediction_domain": "health",
    "prediction_domain_attribute": health_attributes,
    "domain_requirements": health_requirements,
    "domain_examples": health_examples,
    "predictions_N": 10
}

health_prompt_output = pipeline_prompt.format(**health_input_dict)
print(health_prompt_output)

health_df = base_pipeline.generate_predictions(health_prompt_output, 1, "health")
health_df

A prediction ($y$) consists of the following eight properties:

    1. $y_p$, health person that predicted $y$
        - Can be a person (with a name) or a health person such as a health reporter, health analyst, health expert, health top executive, health senior level person, etc).
    2. $y_o$, health organization 
        - Can only be an organization or entity that is associated with the health prediction.
    3. $y_t$, current time when $y$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $y_f$, forecast time when $y$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $y_a$, health prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the health domain.
        - Some examples are obesity rates, prevalence of chronic illnesses, averag

Unnamed: 0,Base Sentences,Prediction Label,Model Name,Domain
0,"On 2024-10-15, Dr. Thompson, a health expert from the World Health Organization, predicts that the obesity rate at the United States will likely decrease by 3% in Q2 of 2026.",1,llama-3.3-70b-versatile,health
1,"In 2024, Dr. Lee, a researcher from the Centers for Disease Control and Prevention, envisions that the average daily caloric intake may rise from 2500 to 3000 in Quarter 4 of 2028.",1,llama-3.3-70b-versatile,health
2,"Dr. Patel, a trusted health analyst, predicts on 20 August 2024 that the global vaccination rate for influenza in the US should stay stable at 90% in 2027 Quarter 3.",1,llama-3.3-70b-versatile,health
3,"According to a report by Dr. Kim, a senior health executive from the University of California, on 12 July 2024, the prevalence of type 2 diabetes in adults is expected to increase beyond 9.5% in the timeframe of Q3 of 2030.",1,llama-3.3-70b-versatile,health
4,"In 2025-08-20, the average weekly exercise hours in Australia has a probability of 25% to reach 25k, as predicted by Dr. Davis, a health expert from Harvard School of Public Health, on 15 October 2024.",1,llama-3.3-70b-versatile,health
5,"On Wednesday, 21 August 2024, Dr. Rodriguez, a medical professional, forecasts that the cancer rate in Canada will decrease by 6% in 2026 Q1.",1,llama-3.3-70b-versatile,health
6,"In Q2 of 2027, the average daily physical activity levels in the UK are expected to rise by 10%, as predicted by Dr. Brown, a health researcher from the National Health Service, on 10 February 2024.",1,llama-3.3-70b-versatile,health
7,"Dr. Johnson, a health expert, predicts on 18 November 2024 that the prevalence of hypertension in Texas will fall by 500 cases by Monday, 18 November 2026.",1,llama-3.3-70b-versatile,health
8,"According to a report by Dr. Martin, a health analyst from the University of Oxford, on 25 August 2024, the average daily sugar intake is expected to decrease beyond 20 grams in the timeframe of Q4 of 2029.",1,llama-3.3-70b-versatile,health
9,"In 2029-08-22, the average weekly walking hours in Germany has a high chance of rising by 12%, as predicted by Dr. Taylor, a health expert from the World Health Organization, on 22 August 2024.",1,llama-3.3-70b-versatile,health


### Generate Policy Predictions

In [16]:
policy_attributes = """election outcomes, economic reforms, legislative impacts."""
policy_requirements = """- Should be based on real-world policy reports.
    - Suppose the time when $y$ was made is during an election cycle or non-election cycles.
    - Include policies & laws, from all sectors such as consumer staples, energy, finance, health care, industrials, materials, media, real estate, retail, technology, utilities, defense, etc."""

policy_examples = """
- policy examples for template 1:
    1. On [Monday, December 16, 2024], [President John Doe] forecasts that the [unemployment rate] at [the United States] [will likely] [decrease] by [2%] in [2025 Q1].
    2. On [Tue, November 19, 2024], [Dr. Jane Smith] foresee that the [population growth rate] in [California] [is likely to] [decrease] by [5 percent to 20 billion] in [08/21/2025].
- policy examples for template 2:
    3. In [October 2024], [Senator Emily Johnson] from [the Senate Committee on Finance], envisions that the [inflation rate] [should] [rise] from [1.3 percent to 89 percen] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Governor Michael Brown] from [the State of Texas], predicts that the [number of registered voters] [will] [fall] under [5B] in [Dec of 2029].
- policy examples for template 3:
    5. [Dija Gabe in the Congressional Budget Office] predicts on [23 October 2024] that [national debt] in [USA] [may] [stay stable] at [20 million] in [2027 Quarter 4].
    6. [Dr. Sarah Lee] foresee in [Q2 2026] that the [median household income] in [NY] [should] [fall] by [629 percent to $15,000] on [Monday, Nov 18, 2026].
- policy examples for template 4:
    7. According to a [General Robert Williams] from [the Department of Defense], on [08/21/2024], the [number of active-duty soldiers] [is expected to] [increase] beyond [$10,000] in the timeframe of [Q3 of 2029].
    8. According to [Dr. Olivia Martinez] from [the Census Bureau], on [Fri, July 12, 2024], the [population density] in [urban areas] [is likely to] [increase] as much as [100,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- policy examples for template 5:
    9. In [2025-08-21], the [number of citizens] in [Thomson, GA, 30824] has a [probability] of [92 percent to reach 30k] [decrease], as predicted by [Shirly Tisdale, a policy reporter] on [21 Oct 24].
    10. In [Quarter of 2027], the  [number of Navy members] in [the United States] [is expected] to be [300K, which is a 15%] [rise], as predicted by [a policy analyst] on [Sun, February 20, 2024]."""

In [17]:
policy_input_dict = {
    "prediction_domain": "policy",
    "prediction_domain_attribute": policy_attributes,
    "domain_requirements": policy_requirements,
    "domain_examples": policy_examples,
    "predictions_N": 10
}

policy_prompt_output = pipeline_prompt.format(**policy_input_dict)
print(policy_prompt_output)

policy_df = base_pipeline.generate_predictions(policy_prompt_output, 1, "policy")
policy_df

A prediction ($y$) consists of the following eight properties:

    1. $y_p$, policy person that predicted $y$
        - Can be a person (with a name) or a policy person such as a policy reporter, policy analyst, policy expert, policy top executive, policy senior level person, etc).
    2. $y_o$, policy organization 
        - Can only be an organization or entity that is associated with the policy prediction.
    3. $y_t$, current time when $y$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $y_f$, forecast time when $y$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $y_a$, policy prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the policy domain.
        - Some examples are election outcomes, economic reforms, legislative impac

Unnamed: 0,Base Sentences,Prediction Label,Model Name,Domain
0,"On 2024-10-15, policy expert, Rachel Kim, predicts that the stock market index at the New York Stock Exchange will likely rise by 10% in 2026 Q2.",1,llama-3.3-70b-versatile,policy
1,"In 2025, Dr. Michael Davis from the Federal Reserve, forecasts that the interest rate will increase from 2% to 5% in 2028 Q4.",1,llama-3.3-70b-versatile,policy
2,"According to a policy analyst, Emily Chen, from the Securities and Exchange Commission, on 08/20/2024, the number of initial public offerings is expected to increase beyond 500 in the timeframe of 2029 Q1.",1,llama-3.3-70b-versatile,policy
3,"Policy senior level person, David Lee, predicts on 2024/11/12 that the national budget deficit in the United States may stay stable at $1 trillion in 2027 Q3.",1,llama-3.3-70b-versatile,policy
4,"In 2027-08-22, the GDP growth rate at the European Union is expected to be 3%, which is a 10% rise, as predicted by a policy top executive, James Wilson, on 2024-02-20.",1,llama-3.3-70b-versatile,policy
5,"On Wednesday, August 21, 2024, policy reporter, Kevin White, forecasts that the unemployment rate at the United Kingdom will likely decrease by 1.5% in 2025 Q4.",1,llama-3.3-70b-versatile,policy
6,"In Q2 of 2026, Dr. Sophia Patel from the World Bank, envisions that the foreign direct investment will rise from $100 billion to $500 billion in 2028 Q1.",1,llama-3.3-70b-versatile,policy
7,"According to a policy expert, Mark Davis, from the International Monetary Fund, on 21 August 2024, the inflation rate is expected to increase beyond 3% in the timeframe of 2029 Q2.",1,llama-3.3-70b-versatile,policy
8,"On 2024/10/18, policy analyst, Olivia Brown, predicts that the consumer price index at the United States will likely fall by 2% in 2026 Q1.",1,llama-3.3-70b-versatile,policy
9,"In 2029 Q3, the number of people with health insurance at the United States is expected to be 300 million, which is a 5% increase, as predicted by a policy senior level person, Daniel Hall, on 2024-08-15.",1,llama-3.3-70b-versatile,policy


### Domain Predictions

In [None]:
predictions_df = DataProcessing.concat_dfs([financial_df, weather_df, health_df, policy_df])
predictions_df

## Generate Non-Predictions

In [22]:
non_prediction_template = """Generate any sentence that's not a prediction, which we name non-prediction. A prediction is below with variables
    - $\hat{y}$, prediction
        - $\hat{y}_{s}$, source that predicted $\hat{y}$
            - Source can be person, organization, and any type of entity.
        - $\hat{y}_{t}$, time when $\hat{y}$ was made
            - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
        - $\hat{y}_{f}$, forecast time when $\hat{y}$ is expected to come to fruition
            - Forecast can be from seconds to decades in the future.
            - How far to go out? Or where to stop?
        - $\hat{y}_a$, prediction attribute
            - Financial based attributes such as stock price, net profit, revenue
        - $\hat{y}_m$, prediction metric outcome
            - How much will the  $\hat{y}_a$ rise/increase or fall/decrease
        - $\hat{y}_v$, future verb tense
            - A verb that is associated with the future such as will, would, be going to, should, etc.

    Please generate 15 non-predictions with the following requirements below:

    1. Only a simple non-prediction (sentence) (and NOT compounding using "and" or "or")
    2. Include no additional information such as "Here are nine simple sentences that are not predictions:", number before non-prediction
    3. At least 10 words and no more than 20 words in the non-prediction
    4. Do not generate redundant non-predictions
"""

print(non_prediction_template)
non_prediction_label = 0

non_predictions_df = base_pipeline.generate_predictions(text=non_prediction_template, label=non_prediction_label, domain="any")
non_predictions_df

Generate any sentence that's not a prediction, which we name non-prediction. A prediction is below with variables
    - $\hat{y}$, prediction
        - $\hat{y}_{s}$, source that predicted $\hat{y}$
            - Source can be person, organization, and any type of entity.
        - $\hat{y}_{t}$, time when $\hat{y}$ was made
            - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
        - $\hat{y}_{f}$, forecast time when $\hat{y}$ is expected to come to fruition
            - Forecast can be from seconds to decades in the future.
            - How far to go out? Or where to stop?
        - $\hat{y}_a$, prediction attribute
            - Financial based attributes such as stock price, net profit, revenue
        - $\hat{y}_m$, prediction metric outcome
            - How much will the  $\hat{y}_a$ rise/increase or fall/decrease
        - $\hat{y}_v$, future verb tense
            - A verb that is associated with the future such as will, would, b

Unnamed: 0,Base Sentences,Prediction Label,Model Name,Domain
0,The company is currently hiring new employees for various positions.,0,llama-3.3-70b-versatile,any
1,The manager is responsible for overseeing daily operations effectively.,0,llama-3.3-70b-versatile,any
2,The team is working diligently to meet the project deadline.,0,llama-3.3-70b-versatile,any
3,The new policy has been implemented to improve customer service.,0,llama-3.3-70b-versatile,any
4,The employees are required to attend a mandatory training session.,0,llama-3.3-70b-versatile,any
5,The company's mission statement is to provide excellent products always.,0,llama-3.3-70b-versatile,any
6,The customer service department is open from Monday to Friday.,0,llama-3.3-70b-versatile,any
7,The employees are expected to follow the company's code conduct.,0,llama-3.3-70b-versatile,any
8,The new product has been designed to be more efficient always.,0,llama-3.3-70b-versatile,any
9,The company's values include integrity and transparency always.,0,llama-3.3-70b-versatile,any


## Store Predictions and Non-Predictions

In [None]:
%store predictions_df
%store non_predictions_df

## Further Explaination

- Need to optimize code
- Include multiple models next

### Multiple Models to Generate Predictions

In [19]:
# domains = ["financial", "weather", "health care"]
# llama_model = LlamaTextGenerationModel()
# model_2 = SomeModel()

# # Should all model dfs be a list of dataframes? Then we can concatenate them all together with pd.concat()
# for domain in domains:
#     llama_df = llama_model.generate_predictions(financial_prompt_output, 1, domain)
#     model_2_df = model_2.generate_predictions(financial_prompt_output, 1, domain)


# # or 

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "financial")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "financial")

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "weather")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "weather")

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "health care")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "health care")

### Convert Sentence to DataFrame by Prediction Property

In [20]:
prediction_cols = ['y_p', 'y_o', 'y_t', 'y_f', 'y_a', 'y_s', 'y_m', 'y_v', 'y_l']
prediction_base_df = pd.DataFrame(columns=prediction_cols)

data_processing = DataProcessing
sentence = "On [Monday, December 16, 2024], [Detravious, an investor] predicts that the [revenue] at [Apple] [will likely] [decrease] by [$87B to $50 billion] in [2025 Q1]"

# Extract prediction data and update prediction_base_df
df = data_processing.ex_sentence_to_df(sentence)
prediction_base_df = pd.concat([prediction_base_df, df], ignore_index=True)

prediction_base_df

Unnamed: 0,y_p,y_o,y_t,y_f,y_a,y_s,y_m,y_v,y_l
0,"Detravious, an investor",Apple,"Monday, December 16, 2024",2025 Q1,revenue,decrease,$87B to $50 billion,will likely,
