# 2-Clean Predictions

- **Goal:** Prediction Recognition

- **Purpose:** To clean irrelevant things from text generation with LLM

- **Misc:**
    - `%store`: Cell magic will store the variable of interest so we can load in another notebook

In [1]:
import os
import sys

import pandas as pd
# Get the current working directory of the notebook
notebook_dir = os.getcwd()
# Add the parent directory to the system path
sys.path.append(os.path.join(notebook_dir, '../'))

from pipelines import BasePipeline
from data_processing import DataProcessing

In [2]:
%store -r predictions_df
%store -r non_predictions_df

pd.set_option('max_colwidth', 800)

In [3]:
predictions_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain,Template Number
0,"On 2024-10-15, Rachel Patel, a financial analyst, predicts that the operating cash flow at General Motors will likely decrease by $5 billion to $10 billion in Q2 of 2026.",1,llama-3.3-70b-versatile,financial,1
1,"In 2024, Michael Chen from Goldman Sachs envisions that the stock price will rise from $500 to $700 per share in 2028.",1,llama-3.3-70b-versatile,financial,2
2,"Emily Taylor, a financial expert, predicts on 08/20/2024 that the research and development expenses at Pfizer may stay stable at $15 million in 2029.",1,llama-3.3-70b-versatile,financial,3
3,"According to a senior executive from Boeing, on 21 Aug 2024, the net profit is expected to increase beyond $20 billion in the timeframe of Q4 of 2027.",1,llama-3.3-70b-versatile,financial,4
4,"In 2025-08-20, the revenue at Netflix has a probability of 20 percent to reach $25 billion, which is a 10% increase, as predicted by David Lee, a financial reporter, on 10/10/2024.",1,llama-3.3-70b-versatile,financial,5
5,"On Wednesday, November 20, 2024, Kevin White, a financial analyst, forecasts that the gross profit at Cisco Systems will likely decrease by 10% to $15 billion in Q1 of 2026.",1,llama-3.3-70b-versatile,financial,1
6,"In Q3 of 2024, Sophia Rodriguez from JPMorgan Chase predicts that the operating income will fall under 5% to $10 billion in 2027.",1,llama-3.3-70b-versatile,financial,2
7,"James Davis, a financial expert, predicts on 2024/08/22 that the revenue at Visa may rise by 15% to $25 billion in 2028.",1,llama-3.3-70b-versatile,financial,3
8,"According to a top executive from Intel, on 2024-08-25, the net profit is expected to increase beyond $30 billion in the timeframe of Q2 of 2029.",1,llama-3.3-70b-versatile,financial,4
9,"In 2026-10-15, the operating cash flow at AT&T has a probability of 15% to reach $20 billion, which is a 5% decrease, as predicted by Olivia Brown, a financial analyst, on 08/25/2024.",1,llama-3.3-70b-versatile,financial,5


In [4]:
non_predictions_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain
0,The company is currently undergoing a major restructuring effort to improve efficiency.,0,llama-3.3-70b-versatile,any
1,The new employee is struggling to learn the complex software system.,0,llama-3.3-70b-versatile,any
2,The manager is reviewing the sales report from last quarter carefully.,0,llama-3.3-70b-versatile,any
3,The team is working diligently to meet the tight project deadline.,0,llama-3.3-70b-versatile,any
4,The customer service department is receiving a high volume of calls.,0,llama-3.3-70b-versatile,any
5,The marketing campaign is focusing on social media platforms exclusively.,0,llama-3.3-70b-versatile,any
6,The company's financial records are being audited by an external firm.,0,llama-3.3-70b-versatile,any
7,The employees are participating in a mandatory training session today.,0,llama-3.3-70b-versatile,any
8,The IT department is troubleshooting the network connectivity issue.,0,llama-3.3-70b-versatile,any
9,The CEO is giving a presentation to the board of directors now.,0,llama-3.3-70b-versatile,any


In [5]:
dfs = [predictions_df, non_predictions_df]
base_df = DataProcessing.concat_dfs(dfs)
base_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain,Template Number
0,"On 2024-10-15, Rachel Patel, a financial analyst, predicts that the operating cash flow at General Motors will likely decrease by $5 billion to $10 billion in Q2 of 2026.",1,llama-3.3-70b-versatile,financial,1.0
1,"In 2024, Michael Chen from Goldman Sachs envisions that the stock price will rise from $500 to $700 per share in 2028.",1,llama-3.3-70b-versatile,financial,2.0
2,"Emily Taylor, a financial expert, predicts on 08/20/2024 that the research and development expenses at Pfizer may stay stable at $15 million in 2029.",1,llama-3.3-70b-versatile,financial,3.0
3,"According to a senior executive from Boeing, on 21 Aug 2024, the net profit is expected to increase beyond $20 billion in the timeframe of Q4 of 2027.",1,llama-3.3-70b-versatile,financial,4.0
4,"In 2025-08-20, the revenue at Netflix has a probability of 20 percent to reach $25 billion, which is a 10% increase, as predicted by David Lee, a financial reporter, on 10/10/2024.",1,llama-3.3-70b-versatile,financial,5.0
...,...,...,...,...,...
75,The company's financial reports are being audited by an external firm.,0,llama-3.3-70b-versatile,any,
76,The IT department is resolving a major technical issue with the network.,0,llama-3.3-70b-versatile,any,
77,The customer service team is handling a high volume of phone calls.,0,llama-3.3-70b-versatile,any,
78,The research team is conducting experiments to gather more data information.,0,llama-3.3-70b-versatile,any,


In [6]:
%store base_df

Stored 'base_df' (DataFrame)


In [7]:
base_pipeline = BasePipeline()

cleaned_predictions_df = base_pipeline.clean_predictions(base_df)
cleaned_predictions_df

Unnamed: 0,Base Sentence,Prediction Label,Model Name,Domain,Template Number
0,"on 2024-10-15, rachel patel, a financial analyst, predicts that the operating cash flow at general motors will likely decrease by $5 billion to $10 billion in q2 of 2026.",1,llama-3.3-70b-versatile,financial,1.0
1,"in 2024, michael chen from goldman sachs envisions that the stock price will rise from $500 to $700 per share in 2028.",1,llama-3.3-70b-versatile,financial,2.0
2,"emily taylor, a financial expert, predicts on 08/20/2024 that the research and development expenses at pfizer may stay stable at $15 million in 2029.",1,llama-3.3-70b-versatile,financial,3.0
3,"according to a senior executive from boeing, on 21 aug 2024, the net profit is expected to increase beyond $20 billion in the timeframe of q4 of 2027.",1,llama-3.3-70b-versatile,financial,4.0
4,"in 2025-08-20, the revenue at netflix has a probability of 20 percent to reach $25 billion, which is a 10% increase, as predicted by david lee, a financial reporter, on 10/10/2024.",1,llama-3.3-70b-versatile,financial,5.0
...,...,...,...,...,...
75,the company's financial reports are being audited by an external firm.,0,llama-3.3-70b-versatile,any,
76,the it department is resolving a major technical issue with the network.,0,llama-3.3-70b-versatile,any,
77,the customer service team is handling a high volume of phone calls.,0,llama-3.3-70b-versatile,any,
78,the research team is conducting experiments to gather more data information.,0,llama-3.3-70b-versatile,any,


In [8]:
%store cleaned_predictions_df

Stored 'cleaned_predictions_df' (DataFrame)
