# 1-Generate Predictions using LangChain

- **Goal:** Prediction Recognition

- **Purpose:** To implement step 1 with sub steps of prediction recognition pipeline. See steps
    1. Generate predictions
        1. Create several prediction prompts templates
        2. Utilize open-source LLMs to generate predictions

- **Misc:**
    - `%store`: Cell magic will store the variable of interest so we can load in another notebook

In [None]:
import os
import sys

import pandas as pd

from langchain_core.prompts import PipelinePromptTemplate, PromptTemplate

# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from pipelines import BasePipeline
from data_processing import DataProcessing

In [None]:
pd.set_option('max_colwidth', 800)
base_pipeline = BasePipeline()

## LangChain Templates

In [None]:
full_prediction_template = """{prediction_properties}

{prediction_requirements}

{prediction_templates}

{prediction_examples}"""
full_prediction_prompt = PromptTemplate.from_template(full_prediction_template)

In [None]:
prediction_properties_template = """A prediction ($p$) consists of the following eight properties:

    1. $p_p$, {prediction_domain} person that predicted $p$
        - Can be a person (with a name) or a {prediction_domain} person such as a {prediction_domain} reporter, {prediction_domain} analyst, {prediction_domain} expert, {prediction_domain} top executive, {prediction_domain} senior level person, etc).
    2. $p_o$, {prediction_domain} organization 
        - Can only be an organization or entity that is associated with the {prediction_domain} prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, {prediction_domain} prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the {prediction_domain} domain.
        - Some examples are {prediction_domain_attribute}.
    6. $p_s$, slope that indicates the direction of change in $p_a$
        - Change of directions can be rise/increase/as much as, fall/decrease/as little as, change, stay stable, high/low chance/probability/degree of.
    7. $p_m$, metric outcome
        - How much will the $p_a$ $p_s$?
    8. $p_v$, future verb tense
        - A verb that is associated with the future such as will, would, be going to, should, etc.
    9. $p_l$, location
        - The location is attached to attribute $p_a$ if {prediction_domain} == 'weather'
    """
prediction_properties_prompt = PromptTemplate.from_template(prediction_properties_template)

In [None]:
prediction_requirements_template = """{prediction_domain} requirements to use for each prediction:

    - Should be based on real-world {prediction_domain} data and not hallucinate.
    - Only a simple sentence (prediction) (and NOT compounding using "and" or "or").
    - Should diversify all eight properties of the prediction ($p$).
    - Should use synonyms of predicts such as forecasts, speculates, foresee, envision, etc., and not use any of them more than ten times.
    - The prediction should be unique and not repeated.
    - The forecast time ($p_f$) should always be after current time ($p_t$) of when forecast ($p$) was made.
    - Do not number the predictions.
    - Do not say, "As the {prediction_domain} at organization ($p_o$), I will generate company-based {prediction_domain} predictions using the provided templates." or anything similar.
    - Should have a forecast time ($p_f$) when $p$ is expected to come to fruition between 2025 to 2050.
    - Use the five different templates and examples provided.
    - Change how the current time ($p_t$) and forecast time ($p_f$) are written in the prediction with examples of (1) Wednesday, August 21, 2024; (2) Wed, August 21, 2024; (3) 08/21/2024; (4) 08/21/2024; (5) 21/08/2024; (6) 21 August 2024; (7) 2024/08/21; (8) 2024-08-21; (9) August 21, 2024; (10) Aug 21, 2024; (11) 21 August 2024, (12) 21 Aug 2024, Q3 of 2027, 2029 of Q3, etc (with removing day of week).
    {domain_requirements}
    - Do not say, "Here are 10 unique weather predictions based on the provided templates and examples:" in the prompt.
    - Keep the brackets around the prediction properties when generating predictions and be sure to include brackets around dates such as "2024-10-15", "2024/08/20", "Q4 of 2024", "2025", "2027 Q1", "Q3 2027", "On 21 Aug 2024".
    - Do not use any of the examples in the prompt.
    - In front of every prodiction, put the template number in the format of "T1:", "T2:", etc."""
prediction_requirements_prompt = PromptTemplate.from_template(prediction_requirements_template)

In [None]:
prediction_templates_template = """Here are some {prediction_domain} templates:

- {prediction_domain} template 1: On [ $p_t$ ], [ $p_p$ ] predicts that the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] [ $p_s$ ] by [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 2: In [ $p_t$ ], [ $p_p$ ] from [ $p_o$ ], predicts that the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] from [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 3: [ $p_p $] predicts on [ $p_t$ ] that the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] [ $p_s$ ] under [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 4: According to a [ $p_p$ ] from [ $p_o$ ], on [ $p_t$ ], the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] beyond [ $p_m$ ] in the timeframe of [ $p_f$ ].
- {prediction_domain} template 5: In [ $p_f$ ], the [ $p_a$ ] at [ $p_o$ ] [ $p_v$ ] a [ $p_m$ ] [ $p_s$ ], as predicted by [ $p_p$ ] on [ $p_t$ ]."""
prediction_templates_prompt = PromptTemplate.from_template(prediction_templates_template)

In [None]:
prediction_examples_template = """Here are some examples of {prediction_domain} predictions:

{domain_examples}


With the above, generate a unique set of {predictions_N} financial predictions. Think from the perspective of an {prediction_domain} anlyst, expert, top executive, or senior level person."""
prediction_examples_prompt = PromptTemplate.from_template(prediction_examples_template)

In [None]:
input_prompts = [
    ("prediction_properties", prediction_properties_prompt),
    ("prediction_requirements", prediction_requirements_prompt),
    ("prediction_templates", prediction_templates_prompt),
    ("prediction_examples", prediction_examples_prompt),
]

pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prediction_prompt, pipeline_prompts=input_prompts
)

  pipeline_prompt = PipelinePromptTemplate(


## Generate Domain Predictions

### Generate Financial Predictions

In [None]:
financial_attributes = """stock price, net profit, revenue, operating cash flow, research and development expenses, operating income, gross profit."""
financial_requirements = """- Should be based on real-world financial earnings reports.
    - Suppose the time when $p$ was made is during any earning season.
    - Include stocks from all sectors such as consumer staples, energy, finance, health care, industrials, materials, media, real estate, retail, technology, utilities, defense, etc."""

# For each template, have a rise, fall, or stable example, respectively.
financial_examples = """
- financial examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious, an investor] forecasts that the [revenue] at [Apple] [will likely] [decrease] from [$87B to $50 billion] in [2025 Q1].
    2. On [Tue, November 19, 2024], [Ava Lee] predicts that the [operating cash flow] at [ExxonMobil] [should] [decrease] by [5 percent to $20 billion] in [08/21/2025].
- financial examples for template 2:
    3. In [October 2024], [Julian Hall] from [Yahoo Finance], envisions that the [stock price] [will] [rise] from [$800 to $1,000 per share] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Mrs. Kalia] from [McDonald's], predicts that the [net profit] [will] [fall] under [5% to $5 billion] in [January of 2029].
- financial examples for template 3:
    5. [Dija Gabe, a financial expert] predicts on [23 October 2024] that the [research and development expenses] at [Alphabet] [may] [stay stable] at [$20 million] in [2027 Quarter 4].
    6. [Mr. Mike] predicts in [Q2 2026] that the [operating income] at [Microsoft] [will] [fall] by [407 percent to $50M] on [Monday, Nov 18, 2026].
- financial examples for template 4:
    7. According to a [top executive] from [Chevron], on [08/21/2024], the [net profit] [is expected to] [increase] beyond [$10,000] in the timeframe of [Q3 of 2029].
    8. According to [Brittany] from [Tesla], on [Fri, July 12, 2024], the [gross profit] [may] [increase] as much as [$30,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- financial examples for template 5:
    9. In [2025-08-21], the [net profit] at [Amazon] has a [probability] of [11 percent to reach $30k] [decrease], as predicted by [Emily Davis, a financial reporter] on [21 Oct 24].
    10. In [Quarter of 2027], the [revenue] at [Facebook] [is expected] to be [$30 billion, which is a 15%] [rise], as predicted by [a financial analyst] on [Sun, February 20, 2024]."""

In [None]:
financial_input_dict = {
    "prediction_domain": "financial",
    "prediction_domain_attribute": financial_attributes,
    "domain_requirements": financial_requirements,
    "domain_examples": financial_examples,
    "predictions_N": 10
}
financial_prompt_output = pipeline_prompt.format(**financial_input_dict)
print(financial_prompt_output)

financial_df = base_pipeline.generate_predictions(financial_prompt_output, 1, "financial")
financial_df

A prediction ($p$) consists of the following eight properties:

    1. $p_p$, financial person that predicted $p$
        - Can be a person (with a name) or a financial person such as a financial reporter, financial analyst, financial expert, financial top executive, financial senior level person, etc).
    2. $p_o$, financial organization 
        - Can only be an organization or entity that is associated with the financial prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, financial prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the financial domain.
        - Some examples are stock price, net prof

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On [2024-10-15], [Samantha Thompson, a financial analyst] predicts that the [operating cash flow] at [Johnson & Johnson] [will likely] [increase] by [10 percent to $20 billion] in [2026 Q2].",1,llama-3.3-70b-versatile,financial
1,"T2: In [Q4 of 2024], [Ethan Kim] from [Bloomberg], forecasts that the [stock price] [will] [fall] from [$500 to $300 per share] in [2028 Q1].",1,llama-3.3-70b-versatile,financial
2,"T3: [Ava Morales, a financial expert] predicts on [2024/08/20] that the [research and development expenses] at [Intel] [may] [stay stable] at [$15 million] in [2027 Q3].",1,llama-3.3-70b-versatile,financial
3,"T4: According to a [senior executive] from [Coca-Cola], on [21 Aug 2024], the [net profit] [is expected to] [rise] beyond [$5 billion] in the timeframe of [2029 Q4].",1,llama-3.3-70b-versatile,financial
4,"T5: In [2025-02-15], the [revenue] at [Visa] [is expected] to [increase] by [15 percent to $25 billion] [rise], as predicted by [Liam Chen, a financial reporter] on [2024-08-22].",1,llama-3.3-70b-versatile,financial
5,"T1: On [Wednesday, November 20, 2024], [Julian Lee, an investor] predicts that the [gross profit] at [UnitedHealth Group] [will likely] [decrease] by [5 percent to $10 billion] in [2026 Q4].",1,llama-3.3-70b-versatile,financial
6,"T2: In [2027 Q2], [Sophia Patel] from [Morgan Stanley], envisions that the [operating income] [will] [increase] from [$10 billion to $15 billion] in [2028 Q3].",1,llama-3.3-70b-versatile,financial
7,"T3: [Noah Brooks, a financial analyst] predicts on [2024/10/18] that the [net profit] at [Procter & Gamble] [may] [fall] by [10 percent to $5 billion] in [2027 Q2].",1,llama-3.3-70b-versatile,financial
8,"T4: According to a [top executive] from [AT&T], on [2024-08-25], the [revenue] [is expected to] [stay stable] at [$40 billion] in the timeframe of [2029 Q1].",1,llama-3.3-70b-versatile,financial
9,"T5: In [2026-08-20], the [stock price] at [McDonald's] [has a probability] of [20 percent to reach $250 per share] [rise], as predicted by [Jackson Brown, a financial expert] on [2024-10-12].",1,llama-3.3-70b-versatile,financial


### Generate Weather Predictions

In [None]:
weather_prediction_templates_template = """Here are some {prediction_domain} templates:

- {prediction_domain} template 1: On [ $p_t$ ], [ $p_p$ ] predicts that the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] [ $p_s$ ] by [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 2: In [ $p_t$ ], [ $p_p$ ] from [ $p_o$ ] in [ $p_l$ ], predicts that the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] from [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 3: [ $p_p $] predicts on [ $p_t$ ] that the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] [ $p_s$ ] under [ $p_m$ ] in [ $p_f$ ].
- {prediction_domain} template 4: According to a [ $p_p$ ] from [ $p_o$ ] in [ $p_l$ ], on [ $p_t$ ], the [ $p_a$ ] [ $p_v$ ] [ $p_s$ ] beyond [ $p_m$ ] in the timeframe of [ $p_f$ ].
- {prediction_domain} template 5: In [ $p_f$ ], the [ $p_a$ ] at [ $p_o$ ] in [ $p_l$ ] [ $p_v$ ] a [ $p_m$ ] [ $p_s$ ], as predicted by [ $p_p$ ] on [ $p_t$ ]."""
weather_prediction_templates_prompt = PromptTemplate.from_template(weather_prediction_templates_template)

In [None]:
weather_attributes = """temperature, precipitation, wind speed, humidity, etc."""
weather_requirements = """- Should be based on real-world weather reports.
    - Suppose the time when $p$ was made is during any season and any location (ie: Florida known for hurricanes, California known for wildfires, etc).
    - Include reports from all meteorologists, weather organizations, or any type of weather entity.."""

# For each template, have a rise, fall, or stable example, respectively.?
weather_examples = """
- weather examples for template 1:
    1. On [Monday, December 16, 2024], [Dr. Melissa Carter] a weather expert at the [National Weather Service], forecasts that the [temperature], in [New York City] [will likely] [decrease] from [5°C to 3°C] on [February 16, 2025 (Fri)].
    2. On [Tue, 19 November 2024], [Ethan James] at the [US Weather Center] predicts that the [precipitation levels], in [San Francisco] [are likely to] [increase] by [20%] in the timeframe of [08/21/2025].
- weather examples for template 2:
    3. In [October 2024], [Samantha Lin] from [NOAA], envisions that the [wind speed] [should] [decrease] by [15 mph] in [Chicago] by [Friday, March 22, 2025].
    4. In [8/15/2027], [Carlos Rivera] from [Weather.com] predicts that the [humidity] [will] [rise] by [30%] in [Miami] in [July of 2025].
- weather examples for template 3:
    5. [Amanda Green], a weather reporter from [Bureau of Meteorology]  predicts on [23 October 2024] that the [temperature] in [Seattle], [will] [fall] by [10°F] in [2025 Quarter 1].
    6. [Mr. Tommy Wu], from [US Weather Center] predicts in [Q2 2026] that [snowfall levels], in [Denver] [will likely] [increase] by [8 inches] in [Monday, Nov 18, 2026].
- weather examples for template 4:
    7. According to a [top executive] from [AccuWeather], on [12/21/2024], the [rainfall] in [Portland] [is expected to] [increase] beyond [10 percent] in the timeframe of [early 2025].
    8. According to [David Harper] from [Weather Underground, on [Fri, August 9, 2024], the [air quality index] in [Los Angeles] [is likely to] [improve] by [20%] in [21 Aug 2024].
- weather examples for template 5:
    9. In [2025-08-21], the [average temperature] in [Houston] has a [probability] of [5 percent to] [decrease], as predicted by [King, a weather reporter] from [Meteorological Department] on [21 Oct 24].
    10. In [Quarter of 2027], wind chill] in [Minneapolis] [is expected] to be [10°F, which is a 15%] [rise], as predicted by [a weather analyst named Ortiz] on [Sun, February 20, 2024]."""

In [None]:
weather_input_prompts = [
    ("prediction_properties", prediction_properties_prompt),
    ("prediction_requirements", prediction_requirements_prompt),
    ("prediction_templates", weather_prediction_templates_prompt),
    ("prediction_examples", prediction_examples_prompt),
]

weather_pipeline_prompt = PipelinePromptTemplate(
    final_prompt=full_prediction_prompt, pipeline_prompts=weather_input_prompts
)

weather_input_dict = {
    "prediction_domain": "weather",
    "prediction_domain_attribute": weather_attributes,
    "domain_requirements": weather_requirements,
    "domain_examples": weather_examples,
    "predictions_N": 10
}
weather_prompt_output = weather_pipeline_prompt.format(**weather_input_dict)
print(weather_prompt_output)

weather_df = base_pipeline.generate_predictions(weather_prompt_output, 1, "weather")
weather_df

A prediction ($p$) consists of the following eight properties:

    1. $p_p$, weather person that predicted $p$
        - Can be a person (with a name) or a weather person such as a weather reporter, weather analyst, weather expert, weather top executive, weather senior level person, etc).
    2. $p_o$, weather organization 
        - Can only be an organization or entity that is associated with the weather prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, weather prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the weather domain.
        - Some examples are temperature, precipitation, wind speed, hum

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On [2024-10-20], [Emily Chen], a senior meteorologist at [National Weather Service], forecasts that the [precipitation levels] at [National Weather Service] in [Los Angeles] [will likely] [increase] by [15%] in [2025-02-15].",1,llama-3.3-70b-versatile,weather
1,"T2: In [Q4 of 2024], [Michael Brown] from [Weather.com] in [New York City], predicts that the [temperature] [will] [decrease] from [10°C to 5°C] in [2025-01-10].",1,llama-3.3-70b-versatile,weather
2,"T3: [Sarah Taylor], a weather expert from [Bureau of Meteorology], predicts on [2024/08/25] that the [humidity] in [Chicago] [will] [rise] by [20%] in [2025 Q2].",1,llama-3.3-70b-versatile,weather
3,"T4: According to a [weather analyst] from [AccuWeather], on [2024-08-18], the [wind speed] in [Miami] [is expected to] [increase] beyond [10 mph] in the timeframe of [2025-03-01].",1,llama-3.3-70b-versatile,weather
4,"T5: In [2027-08-20], the [average precipitation] in [Seattle] [is likely to] [decrease] by [5%], as predicted by [James Davis], a weather reporter from [Meteorological Department], on [2024-09-15].",1,llama-3.3-70b-versatile,weather
5,"T1: On [21 Aug 2024], [Lisa Nguyen], a top executive at [US Weather Center], forecasts that the [snowfall levels] at [US Weather Center] in [Denver] [will likely] [increase] by [10 inches] in [2025-11-20].",1,llama-3.3-70b-versatile,weather
6,"T2: In [2025 Q1], [Kevin White] from [NOAA] in [Boston], predicts that the [air quality index] [will] [improve] by [15%] in [2025-04-01].",1,llama-3.3-70b-versatile,weather
7,"T3: [Jessica Lee], a weather senior level person from [Weather Underground], predicts on [2024-10-10] that the [temperature] in [Dallas] [will] [fall] by [8°F] in [2025-02-01].",1,llama-3.3-70b-versatile,weather
8,"T4: According to a [weather expert] from [National Weather Service], on [2024/09/01], the [rainfall] in [San Francisco] [is expected to] [increase] beyond [15 percent] in the timeframe of [2025-06-01].",1,llama-3.3-70b-versatile,weather
9,"T5: In [2026-09-15], the [wind chill] in [Minneapolis] [is expected] to be [5°F, which is a 10%] [decrease], as predicted by [Mark Hall], a weather analyst from [Meteorological Department], on [2024-08-22].",1,llama-3.3-70b-versatile,weather


### Generate Health Predictions

In [None]:
health_attributes = """obesity rates, prevalence of chronic illnesses, average physical activity levels, nutritional intake, etc."""
health_requirements = """- Should be based on real-world health reports.
    - Suppose the time when $p$ was made is during any season such as flu season, allergy season, pandemic, epidemic, etc.
    - Include reports from all Health organization, researcher, doctor, physical therapist, physician assistant, nurse practictioners, fitness expert, etc."""

# For each template, have a rise, fall, or stable example, respectively.?
health_examples = """
- health examples for template 1:
    1. On [Monday, December 16, 2024], [Detravious, an investor] forecasts that the [obesity rate] at the [United States] [will likely] [decrease] by [5%] in [2025 Q1].
    2. On [Tue, November 19, 2024], [medical professional Sophia Rodriguez] predicts that the [cancer rate] in [Georgia] [should] [decrease] by [4 percent] in [08/21/2025].
- health examples for template 2:
    3. In [October 2024], [Arjun Patel, Ph.D] from [Florida Department of Health] envisions that the [average daily caloric intake] [may] [rise] from [100 to 300] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Dr. Michael Brown] from the [Centers for Disease Control and Prevention], foresee that the [average daily caloric intake] [will] [fall] [8 percent] in [2027]
- Examples for template 3: .
- health examples for template 3:
    5. [A trusted expert] predicts on [23 October 2024] that the [global vaccination rate for measles] in the [US] [should] [stay stable] at [100K people] in [2027 Quarter 4].
    6. [Dr. Sarah Johnson] foresee in [Q2 2026] that the [prevalence of hypertension] in [California] [will] [fall] by [407 percent] by [Monday, Nov 18, 2026].
- health examples for template 4:
    7. According to a [Olivia Martinez] from [Stanford University], on [08/21/2024], the prevalence of [type 2 diabetes in adults] [is expected to] [increase] beyond [8.5 percent] in the timeframe of [Q3 of 2029].
    8. According to [Rachel Kim, MD] from the [University of California], on [Fri, July 12, 2024], the prevalence of [type 2 diabetes in adults] [may] [increase] as much as [30,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- health examples for template 5:
    9. In [2025-08-21], the [average weekly exercise hours] in [United States] has a [probability] of [20 percent to reach 30k], as predicted by [Emily Davis, Harvard School of Public Health] on [21 Oct 24].
    10. In [Quarter of 2027], the [average weekly walking hours] in [Atlanta] is [expected to rise] by [15%], as predicted by the [Monique, National Institutes of Health] on [Sun, February 20, 2024]."""

In [None]:
health_input_dict = {
    "prediction_domain": "health",
    "prediction_domain_attribute": health_attributes,
    "domain_requirements": health_requirements,
    "domain_examples": health_examples,
    "predictions_N": 10
}

health_prompt_output = pipeline_prompt.format(**health_input_dict)
print(health_prompt_output)

health_df = base_pipeline.generate_predictions(health_prompt_output, 1, "health")
health_df

A prediction ($p$) consists of the following eight properties:

    1. $p_p$, health person that predicted $p$
        - Can be a person (with a name) or a health person such as a health reporter, health analyst, health expert, health top executive, health senior level person, etc).
    2. $p_o$, health organization 
        - Can only be an organization or entity that is associated with the health prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, health prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the health domain.
        - Some examples are obesity rates, prevalence of chronic illnesses, averag

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On [2024-10-15], [Dr. Rachel Lee, a health expert] predicts that the [average blood pressure] at [New York City] [will likely] [decrease] by [3%] in [2026 Q2].",1,llama-3.3-70b-versatile,health
1,"T2: In [Q4 of 2024], [Dr. David Kim, a researcher] from [Johns Hopkins University] forecasts that the [prevalence of heart disease] [may] [rise] from [200 to 250 per 100,000 people] in [2028].",1,llama-3.3-70b-versatile,health
2,"T3: [Dr. Maria Rodriguez, a senior health analyst] predicts on [2024/08/20] that the [obesity rate among children] in [Texas] [should] [stay stable] at [12%] in [2027 Q1].",1,llama-3.3-70b-versatile,health
3,"T4: According to a [Dr. John Taylor, a physician] from [Cedars-Sinai Medical Center], on [2024-08-22], the [average daily step count] [is expected to] [increase] beyond [8,000 steps] in the timeframe of [2029 Q3].",1,llama-3.3-70b-versatile,health
4,"T5: In [2025-10-01], the [average weekly fruit consumption] in [California] [will likely] [rise] by [15%], as predicted by [Dr. Emily Chen, a nutrition expert] on [2024-09-15].",1,llama-3.3-70b-versatile,health
5,"T1: On [21 Aug 2024], [Dr. Kevin White, a health expert] predicts that the [prevalence of diabetes] at [Florida] [will] [decrease] by [2%] in [2026].",1,llama-3.3-70b-versatile,health
6,"T2: In [2027 Q1], [Dr. Sophia Patel, a researcher] from [Harvard University] envisions that the [average daily caloric intake] [may] [fall] from [2,500 to 2,000 calories] in [2028 Q2].",1,llama-3.3-70b-versatile,health
7,"T3: [Dr. Michael Brown, a senior health executive] predicts on [2024-11-01] that the [vaccination rate for influenza] in [United States] [should] [stay stable] at [50%] in [2027 Q4].",1,llama-3.3-70b-versatile,health
8,"T4: According to a [Dr. Lisa Nguyen, a physician assistant] from [Stanford Health Care], on [2024/09/10], the [prevalence of mental health disorders] [is expected to] [increase] beyond [20%] in the timeframe of [2030 Q1].",1,llama-3.3-70b-versatile,health
9,"T5: In [2028-07-01], the [average weekly exercise hours] in [New York State] [will likely] [rise] by [10%], as predicted by [Dr. Daniel Lee, a fitness expert] on [2024-10-01].",1,llama-3.3-70b-versatile,health


### Generate Policy Predictions

In [None]:
policy_attributes = """election outcomes, economic reforms, legislative impacts."""
policy_requirements = """- Should be based on real-world policy reports.
    - Suppose the time when $p$ was made is during an election cycle or non-election cycles.
    - Include policies & laws, from all sectors such as consumer staples, energy, finance, health care, industrials, materials, media, real estate, retail, technology, utilities, defense, etc."""

policy_examples = """
- policy examples for template 1:
    1. On [Monday, December 16, 2024], [President John Doe] forecasts that the [unemployment rate] at [the United States] [will likely] [decrease] by [2%] in [2025 Q1].
    2. On [Tue, November 19, 2024], [Dr. Jane Smith] foresee that the [population growth rate] in [California] [is likely to] [decrease] by [5 percent to 20 billion] in [08/21/2025].
- policy examples for template 2:
    3. In [October 2024], [Senator Emily Johnson] from [the Senate Committee on Finance], envisions that the [inflation rate] [should] [rise] from [1.3 percent to 89 percen] in [Quarter 3 of 2028].
    4. In [8/15/2027], [Governor Michael Brown] from [the State of Texas], predicts that the [number of registered voters] [will] [fall] under [5B] in [Dec of 2029].
- policy examples for template 3:
    5. [Dija Gabe in the Congressional Budget Office] predicts on [23 October 2024] that [national debt] in [USA] [may] [stay stable] at [20 million] in [2027 Quarter 4].
    6. [Dr. Sarah Lee] foresee in [Q2 2026] that the [median household income] in [NY] [should] [fall] by [629 percent to $15,000] on [Monday, Nov 18, 2026].
- policy examples for template 4:
    7. According to a [General Robert Williams] from [the Department of Defense], on [08/21/2024], the [number of active-duty soldiers] [is expected to] [increase] beyond [$10,000] in the timeframe of [Q3 of 2029].
    8. According to [Dr. Olivia Martinez] from [the Census Bureau], on [Fri, July 12, 2024], the [population density] in [urban areas] [is likely to] [increase] as much as [100,000,000, reflecting a 1209 percent increase] by [21 Aug 2024].
- policy examples for template 5:
    9. In [2025-08-21], the [number of citizens] in [Thomson, GA, 30824] has a [probability] of [92 percent to reach 30k] [decrease], as predicted by [Shirly Tisdale, a policy reporter] on [21 Oct 24].
    10. In [Quarter of 2027], the  [number of Navy members] in [the United States] [is expected] to be [300K, which is a 15%] [rise], as predicted by [a policy analyst] on [Sun, February 20, 2024]."""

In [None]:
policy_input_dict = {
    "prediction_domain": "policy",
    "prediction_domain_attribute": policy_attributes,
    "domain_requirements": policy_requirements,
    "domain_examples": policy_examples,
    "predictions_N": 10
}

policy_prompt_output = pipeline_prompt.format(**policy_input_dict)
print(policy_prompt_output)

policy_df = base_pipeline.generate_predictions(policy_prompt_output, 1, "policy")
policy_df

A prediction ($p$) consists of the following eight properties:

    1. $p_p$, policy person that predicted $p$
        - Can be a person (with a name) or a policy person such as a policy reporter, policy analyst, policy expert, policy top executive, policy senior level person, etc).
    2. $p_o$, policy organization 
        - Can only be an organization or entity that is associated with the policy prediction.
    3. $p_t$, current time when $p$ was made
        - Time is the exact moment that can be measured in day, hour, minute, second, etc.
    4. $p_f$, forecast time when $p$ is expected to come to fruition
        - Forecast can be from a second to anytime in the future.
        - Answers the questions: "How far to go out?" or "Where to stop?".
    5. $p_a$, policy prediction attribute
        - Measurable domain-specific attributes such as various quantifiable metrics relevant to the policy domain.
        - Some examples are election outcomes, economic reforms, legislative impac

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On [2024-10-15], [Chief Financial Officer, Michael Davis] predicts that the [stock market index] at [Wall Street] [will likely] [rise] by [10%] in [2026 Q2].",1,llama-3.3-70b-versatile,policy
1,"T2: In [Q4 of 2024], [Senior Policy Analyst, Emily Chen] from [the Federal Reserve], envisions that the [interest rate] [should] [fall] from [2.5% to 1.8%] in [2028 Q1].",1,llama-3.3-70b-versatile,policy
2,"T3: [Policy Expert, David Lee] predicts on [2024/08/20] that the [GDP growth rate] in [the United States] [may] [stay stable] at [2.2%] in [2027 Q3].",1,llama-3.3-70b-versatile,policy
3,"T4: According to a [Financial Advisor, Sarah Taylor] from [JPMorgan Chase], on [21 Aug 2024], the [number of mortgage applications] [is expected to] [increase] beyond [500,000] in the timeframe of [2029 Q4].",1,llama-3.3-70b-versatile,policy
4,"T5: In [2025-08-20], the [unemployment rate] in [New York City] [will] [decrease] by [1.5%] [fall], as predicted by [Economist, James Wilson] on [2024-08-15].",1,llama-3.3-70b-versatile,policy
5,"T1: On [Wednesday, November 20, 2024], [CEO, Mark Zuckerberg] forecasts that the [cryptocurrency market] at [NASDAQ] [will likely] [decrease] by [15%] in [2026 Q4].",1,llama-3.3-70b-versatile,policy
6,"T2: In [2027 Q1], [Policy Reporter, Rachel Kim] from [Bloomberg], predicts that the [inflation rate] [should] [rise] from [1.8% to 3.2%] in [2028 Q2].",1,llama-3.3-70b-versatile,policy
7,"T3: [Senior Economist, Kevin White] predicts on [2024/09/10] that the [consumer price index] in [the European Union] [may] [increase] by [2.5%] in [2027 Q2].",1,llama-3.3-70b-versatile,policy
8,"T4: According to a [Financial Analyst, Lisa Nguyen] from [Goldman Sachs], on [2024-09-15], the [number of initial public offerings] [is expected to] [fall] below [100] in the timeframe of [2029 Q1].",1,llama-3.3-70b-versatile,policy
9,"T5: In [2026-08-21], the [number of bankruptcies] in [the retail sector] [is likely to] [decrease] by [20%] [fall], as predicted by [Policy Expert, Daniel Hall] on [2024-08-20].",1,llama-3.3-70b-versatile,policy


### Combine Domain Predictions

In [None]:
predictions_df = DataProcessing.concat_dfs([financial_df, weather_df, health_df, policy_df])
predictions_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,"T1: On [2024-10-15], [Samantha Thompson, a financial analyst] predicts that the [operating cash flow] at [Johnson & Johnson] [will likely] [increase] by [10 percent to $20 billion] in [2026 Q2].",1,llama-3.3-70b-versatile,financial
1,"T2: In [Q4 of 2024], [Ethan Kim] from [Bloomberg], forecasts that the [stock price] [will] [fall] from [$500 to $300 per share] in [2028 Q1].",1,llama-3.3-70b-versatile,financial
2,"T3: [Ava Morales, a financial expert] predicts on [2024/08/20] that the [research and development expenses] at [Intel] [may] [stay stable] at [$15 million] in [2027 Q3].",1,llama-3.3-70b-versatile,financial
3,"T4: According to a [senior executive] from [Coca-Cola], on [21 Aug 2024], the [net profit] [is expected to] [rise] beyond [$5 billion] in the timeframe of [2029 Q4].",1,llama-3.3-70b-versatile,financial
4,"T5: In [2025-02-15], the [revenue] at [Visa] [is expected] to [increase] by [15 percent to $25 billion] [rise], as predicted by [Liam Chen, a financial reporter] on [2024-08-22].",1,llama-3.3-70b-versatile,financial
5,"T1: On [Wednesday, November 20, 2024], [Julian Lee, an investor] predicts that the [gross profit] at [UnitedHealth Group] [will likely] [decrease] by [5 percent to $10 billion] in [2026 Q4].",1,llama-3.3-70b-versatile,financial
6,"T2: In [2027 Q2], [Sophia Patel] from [Morgan Stanley], envisions that the [operating income] [will] [increase] from [$10 billion to $15 billion] in [2028 Q3].",1,llama-3.3-70b-versatile,financial
7,"T3: [Noah Brooks, a financial analyst] predicts on [2024/10/18] that the [net profit] at [Procter & Gamble] [may] [fall] by [10 percent to $5 billion] in [2027 Q2].",1,llama-3.3-70b-versatile,financial
8,"T4: According to a [top executive] from [AT&T], on [2024-08-25], the [revenue] [is expected to] [stay stable] at [$40 billion] in the timeframe of [2029 Q1].",1,llama-3.3-70b-versatile,financial
9,"T5: In [2026-08-20], the [stock price] at [McDonald's] [has a probability] of [20 percent to reach $250 per share] [rise], as predicted by [Jackson Brown, a financial expert] on [2024-10-12].",1,llama-3.3-70b-versatile,financial


## Generate Non-Predictions

In [None]:
non_prediction_template = """Generate any sentence that's not a prediction, which we name non-prediction. A prediction is below with variables
    - $p$, prediction
        - $p_s$, source that predicted $p$
            - Source can be person, organization, and any type of entity.
        - $p_t$, time when $p$ was made
            - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
        - $p_f$, forecast time when $p$ is expected to come to fruition
            - Forecast can be from seconds to decades in the future.
            - How far to go out? Or where to stop?
        - $p_a$, prediction attribute
            - Financial based attributes such as stock price, net profit, revenue
        - $p_m$, prediction metric outcome
            - How much will the  $p_a$ rise/increase or fall/decrease
        - $p_v$, future verb tense
            - A verb that is associated with the future such as will, would, be going to, should, etc.

    Please generate 15 non-predictions with the following requirements below:

    1. Only a simple non-prediction (sentence) (and NOT compounding using "and" or "or")
    2. Include no additional information such as "Here are nine simple sentences that are not predictions:", number before non-prediction
    3. At least 10 words and no more than 20 words in the non-prediction
    4. Do not generate redundant non-predictions
"""

print(non_prediction_template)
non_prediction_label = 0

non_predictions_df = base_pipeline.generate_predictions(text=non_prediction_template, label=non_prediction_label, domain="any")
non_predictions_df

Generate any sentence that's not a prediction, which we name non-prediction. A prediction is below with variables
    - $p$, prediction
        - $p_s$, source that predicted $p$
            - Source can be person, organization, and any type of entity.
        - $p_t$, time when $p$ was made
            - Time is the exact moment that can be measured in day, hour, minutes, seconds, etc.
        - $p_f$, forecast time when $p$ is expected to come to fruition
            - Forecast can be from seconds to decades in the future.
            - How far to go out? Or where to stop?
        - $p_a$, prediction attribute
            - Financial based attributes such as stock price, net profit, revenue
        - $p_m$, prediction metric outcome
            - How much will the  $p_a$ rise/increase or fall/decrease
        - $p_v$, future verb tense
            - A verb that is associated with the future such as will, would, be going to, should, etc.

    Please generate 15 non-predictions with th

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,The company is currently hiring new employees for various positions available now.,0,llama-3.3-70b-versatile,any
1,The manager is evaluating the current market situation very carefully today.,0,llama-3.3-70b-versatile,any
2,The team is working diligently to resolve the ongoing issue immediately.,0,llama-3.3-70b-versatile,any
3,The new policy has been implemented successfully across the organization already.,0,llama-3.3-70b-versatile,any
4,The employees are receiving extensive training on the latest software tools available.,0,llama-3.3-70b-versatile,any
5,The customer service department is handling a high volume of calls daily.,0,llama-3.3-70b-versatile,any
6,The research team is conducting experiments to gather more accurate data now.,0,llama-3.3-70b-versatile,any
7,The company is expanding its operations to new international markets slowly.,0,llama-3.3-70b-versatile,any
8,The marketing team is creating a new advertising campaign for launch soon.,0,llama-3.3-70b-versatile,any
9,The sales department is tracking the current sales performance closely every day.,0,llama-3.3-70b-versatile,any


## Store Predictions and Non-Predictions

In [None]:
%store predictions_df
%store non_predictions_df

Stored 'predictions_df' (DataFrame)
Stored 'non_predictions_df' (DataFrame)


## Further Explaination

- Need to optimize code
- Include multiple models next

### Multiple Models to Generate Predictions

In [None]:
# domains = ["financial", "weather", "health care"]
# llama_model = LlamaTextGenerationModel()
# model_2 = SomeModel()

# # Should all model dfs be a list of dataframes? Then we can concatenate them all together with pd.concat()
# for domain in domains:
#     llama_df = llama_model.generate_predictions(financial_prompt_output, 1, domain)
#     model_2_df = model_2.generate_predictions(financial_prompt_output, 1, domain)


# # or 

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "financial")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "financial")

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "weather")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "weather")

# llama_df = llama_model.generate_predictions(financial_prompt_output, 1, "health care")
# model_2_df = model_2.generate_predictions(financial_prompt_output, 1, "health care")

### Convert Sentence to DataFrame by Prediction Property

In [None]:
data_processing = DataProcessing

In [None]:
updated_predictions_df = data_processing.reformat_df_with_template_number(predictions_df, col_name="Base Sentence")
updated_predictions_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain,Template Number
0,"On [2024-10-15], [Samantha Thompson, a financial analyst] predicts that the [operating cash flow] at [Johnson & Johnson] [will likely] [increase] by [10 percent to $20 billion] in [2026 Q2].",1,llama-3.3-70b-versatile,financial,1
1,"In [Q4 of 2024], [Ethan Kim] from [Bloomberg], forecasts that the [stock price] [will] [fall] from [$500 to $300 per share] in [2028 Q1].",1,llama-3.3-70b-versatile,financial,2
2,"[Ava Morales, a financial expert] predicts on [2024/08/20] that the [research and development expenses] at [Intel] [may] [stay stable] at [$15 million] in [2027 Q3].",1,llama-3.3-70b-versatile,financial,3
3,"According to a [senior executive] from [Coca-Cola], on [21 Aug 2024], the [net profit] [is expected to] [rise] beyond [$5 billion] in the timeframe of [2029 Q4].",1,llama-3.3-70b-versatile,financial,4
4,"In [2025-02-15], the [revenue] at [Visa] [is expected] to [increase] by [15 percent to $25 billion] [rise], as predicted by [Liam Chen, a financial reporter] on [2024-08-22].",1,llama-3.3-70b-versatile,financial,5
5,"On [Wednesday, November 20, 2024], [Julian Lee, an investor] predicts that the [gross profit] at [UnitedHealth Group] [will likely] [decrease] by [5 percent to $10 billion] in [2026 Q4].",1,llama-3.3-70b-versatile,financial,1
6,"In [2027 Q2], [Sophia Patel] from [Morgan Stanley], envisions that the [operating income] [will] [increase] from [$10 billion to $15 billion] in [2028 Q3].",1,llama-3.3-70b-versatile,financial,2
7,"[Noah Brooks, a financial analyst] predicts on [2024/10/18] that the [net profit] at [Procter & Gamble] [may] [fall] by [10 percent to $5 billion] in [2027 Q2].",1,llama-3.3-70b-versatile,financial,3
8,"According to a [top executive] from [AT&T], on [2024-08-25], the [revenue] [is expected to] [stay stable] at [$40 billion] in the timeframe of [2029 Q1].",1,llama-3.3-70b-versatile,financial,4
9,"In [2026-08-20], the [stock price] at [McDonald's] [has a probability] of [20 percent to reach $250 per share] [rise], as predicted by [Jackson Brown, a financial expert] on [2024-10-12].",1,llama-3.3-70b-versatile,financial,5


In [None]:
def select_pattern(template_number: int) -> str:
    """Select the pattern to use based on the template number
    
    Parameters:
    -----------
    template_number: `int`
        The template number that corresponds to the pattern to use

    Returns:
    --------
    `str`
        A string containing the pattern to use
    """

    if template_number == 1:
        pattern = r"On \[(.*?)\], \[(.*?)\] predicts that the \[(.*?)\] at \[(.*?)\] \[(.*?)\] \[(.*?)\] by \[(.*?)\] in \[(.*?)\]."
        return pattern
    elif template_number == 2:
        pattern = r"In \[(.*?)\], \[(.*?)\] from \[(.*?)\], forecasts that the \[(.*?)\] \[(.*?)\] \[(.*?)\] from \[(.*?)\]] in \[(.*?)\]."
        return pattern
    elif template_number == 3:
        pattern = r"\[(.*?)\] (.*?) on \[(.*?)\] that the \[(.*?)\] in \[(.*?)\] \[(.*?)\] \[(.*?)\] by \[(.*?)\] in \[(.*?)\]."
        return pattern
    elif template_number == 4:
        pattern = r"According to a \[(.*?)\] from \[(.*?)\], on \[(.*?)\], the \[(.*?)\] \[(.*?)\] \[(.*?)\] beyond \[(.*?)\] in the timeframe of \[(.*?)\]."
        return pattern
    elif template_number == 5:
        pattern = r"In \[(.*?)\], the \[(.*?)\] in \[(.*?)\] \[(.*?)\] \[(.*?)\] by \[(.*?)\], as (.*?) by \[(.*?)\] on \[(.*?)\]."
        return pattern
    else:
        raise ValueError("The template number is not recognized.")

In [None]:
import re
def sentence_to_df(df: pd.DataFrame) -> pd.DataFrame:
    """Convert an example sentence to a DataFrame

    NOTE: This function is specific to the template used in the example sentence
    
    Parameters:
    -----------
    sentence: `str`
        A sentence containing the variables to extract

    Returns:
    --------
    `pd.DataFrame`
        A DataFrame containing the extracted variables
    """
    template_numbers = df['Template Number'].values
    base_sentences = df['Base Sentence'].values
    
    extracted_data = []

    for template_number, sentence in zip(template_numbers, base_sentences):
        print(sentence)
        pattern = select_pattern(template_number)
        print(pattern)
        
        match = re.match(pattern, sentence)
        print(match)
        
        if match:
            p_t, p_p, p_a, p_o, p_v, p_s, p_m, p_f = match.groups()
            print(p_t, p_p, p_a, p_o, p_v, p_s, p_m, p_f)
            data = {
                'p_p': p_p,
                'p_o': p_o,
                'p_t': p_t,
                'p_f': p_f,
                'p_a': p_a,
                'p_s': p_s,
                'p_m': p_m,
                'p_v': p_v,
                'p_l': None  # Assuming y_l is not used in this template
            }
            extracted_data.append(data)
        else:
            print(sentence)
            raise ValueError("The sentence does not match the expected template.")
        print()
    return pd.DataFrame(extracted_data)

In [None]:
prediction_cols = ['p_p', 'p_o', 'p_t', 'p_f', 'p_a', 'p_s', 'p_m', 'p_v', 'p_l']
prediction_base_df = pd.DataFrame(columns=prediction_cols)

# Extract prediction data and update prediction_base_df
df = sentence_to_df(updated_predictions_df)
prediction_base_df = pd.concat([prediction_base_df, df], ignore_index=True)

prediction_base_df

On [2024-10-15], [Samantha Thompson, a financial analyst] predicts that the [operating cash flow] at [Johnson & Johnson] [will likely] [increase] by [10 percent to $20 billion] in [2026 Q2].
On \[(.*?)\], \[(.*?)\] predicts that the \[(.*?)\] at \[(.*?)\] \[(.*?)\] \[(.*?)\] by \[(.*?)\] in \[(.*?)\].
<re.Match object; span=(0, 190), match='On [2024-10-15], [Samantha Thompson, a financial >
2024-10-15 Samantha Thompson, a financial analyst operating cash flow Johnson & Johnson will likely increase 10 percent to $20 billion 2026 Q2

In [Q4 of 2024], [Ethan Kim] from [Bloomberg], forecasts that the [stock price] [will] [fall] from [$500 to $300 per share] in [2028 Q1].
In \[(.*?)\], \[(.*?)\] from \[(.*?)\], forecasts that the \[(.*?)\] \[(.*?)\] \[(.*?)\] from \[(.*?)\]] in \[(.*?)\].
None
In [Q4 of 2024], [Ethan Kim] from [Bloomberg], forecasts that the [stock price] [will] [fall] from [$500 to $300 per share] in [2028 Q1].


ValueError: The sentence does not match the expected template.

In [None]:
import re
import pandas as pd

def extract_predictions(df: pd.DataFrame) -> pd.DataFrame:
    """Extracts structured prediction data from sentences using predefined templates.

    Parameters:
    -----------
    df : `pd.DataFrame`
        DataFrame containing the base sentences and corresponding template numbers.

    Returns:
    --------
    `pd.DataFrame`
        DataFrame containing extracted prediction components.
    """
    # Define regex patterns for different templates
    patterns = {
        1: r"On \[(.*?)\], \[(.*?)\] predicts that the \[(.*?)\] at \[(.*?)\] \[(.*?)\] \[(.*?)\] by \[(.*?)\] in \[(.*?)\].",
        2: r"In \[(.*?)\], \[(.*?)\] from \[(.*?)\], forecasts that the \[(.*?)\] \[(.*?)\] \[(.*?)\] from \[(.*?)\] in \[(.*?)\].",
        3: r"\[(.*?)\] (.*?) on \[(.*?)\] that the \[(.*?)\] in \[(.*?)\] \[(.*?)\] \[(.*?)\] by \[(.*?)\] in \[(.*?)\].",
        4: r"According to a \[(.*?)\] from \[(.*?)\], on \[(.*?)\], the \[(.*?)\] \[(.*?)\] \[(.*?)\] beyond \[(.*?)\] in the timeframe of \[(.*?)\].",
        5: r"In \[(.*?)\], the \[(.*?)\] in \[(.*?)\] \[(.*?)\] \[(.*?)\] by \[(.*?)\], as (.*?) by \[(.*?)\] on \[(.*?)\]."
    }

    extracted_data = []

    for _, row in df.iterrows():
        template_number = row['Template Number']
        sentence = row['Base Sentence']

        pattern = patterns.get(template_number)
        if not pattern:
            raise ValueError(f"Template {template_number} not recognized.")

        match = re.search(pattern, sentence)
        if match:
            extracted_values = list(match.groups())
            # Ensure consistent column names
            data = {
                'p_p': extracted_values[1],  # Predictor
                'p_o': extracted_values[3],  # Object (Company, Asset)
                'p_t': extracted_values[0],  # Time of Prediction
                'p_f': extracted_values[-1], # Forecast Timeframe
                'p_a': extracted_values[2],  # Aspect (Financial Metric)
                'p_s': extracted_values[4],  # Signal (will/may/expected to)
                'p_m': extracted_values[5],  # Movement (increase/decrease/stay stable)
                'p_v': extracted_values[6],  # Magnitude (percent change, dollar value)
                'p_l': None  # Placeholder in case additional logic is needed later
            }
            extracted_data.append(data)
        else:
            print(f"Warning: No match found for sentence:\n{sentence}")

    return pd.DataFrame(extracted_data)

# Initialize empty DataFrame with necessary columns
prediction_cols = ['p_p', 'p_o', 'p_t', 'p_f', 'p_a', 'p_s', 'p_m', 'p_v', 'p_l']
prediction_base_df = pd.DataFrame(columns=prediction_cols)

# Extract prediction data and update prediction_base_df
df_extracted = extract_predictions(updated_predictions_df)
prediction_base_df = pd.concat([prediction_base_df, df_extracted], ignore_index=True)


[Ava Morales, a financial expert] predicts on [2024/08/20] that the [research and development expenses] at [Intel] [may] [stay stable] at [$15 million] in [2027 Q3].
In [2025-02-15], the [revenue] at [Visa] [is expected] to [increase] by [15 percent to $25 billion] [rise], as predicted by [Liam Chen, a financial reporter] on [2024-08-22].
In [2027 Q2], [Sophia Patel] from [Morgan Stanley], envisions that the [operating income] [will] [increase] from [$10 billion to $15 billion] in [2028 Q3].
[Noah Brooks, a financial analyst] predicts on [2024/10/18] that the [net profit] at [Procter & Gamble] [may] [fall] by [10 percent to $5 billion] in [2027 Q2].
According to a [top executive] from [AT&T], on [2024-08-25], the [revenue] [is expected to] [stay stable] at [$40 billion] in the timeframe of [2029 Q1].
In [2026-08-20], the [stock price] at [McDonald's] [has a probability] of [20 percent to reach $250 per share] [rise], as predicted by [Jackson Brown, a financial expert] on [2024-10-12].
