<a href="https://colab.research.google.com/github/uptrain-ai/uptrain/blob/main/examples/experiments/llm_compression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align="center">
  <a href="https://uptrain.ai">
    <img width="300" src="https://user-images.githubusercontent.com/108270398/214240695-4f958b76-c993-4ddd-8de6-8668f4d0da84.png" alt="uptrain">
  </a>
</h1>

## Experimenting with Compressed and non-compressed

**Overview**: In this notebook, we will compare the same mode, but compressed and non-compressed to measure the hit in accuracy due to compression. We will use around 30 randomly picked examples from the Financial QA dataset and evaluate the response on different criteria to determine which of the two models performs better. This notebook benchmarks 2 variations of the same model called Roberta, one being noncompressed and the other being the compressed version using knowledge distillation. We also use UpTrain's metrics to further evaluate the outputs from both models




In [1]:
!pip install openai uptrain -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m257.5/257.5 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m159.6/159.6 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.8/77.8 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25h

### Download the testing dataset

Note: Ground Truth is optional as UpTrain supports many checks (like factual accuracy, response relevance, etc. which doesn't require ground truth)

In [2]:
import polars as pl
import os
import re

url = "https://uptrain-assets.s3.ap-south-1.amazonaws.com/data/evaluations_dataset.jsonl"
dataset_path = os.path.join('./', "benchmark.jsonl")

if not os.path.exists(dataset_path):
    import httpx
    r = httpx.get(url)
    with open(dataset_path, "wb") as f:
        f.write(r.content)

filtered_dataset = pl.read_ndjson(dataset_path).select(pl.col(["question", "ground_truth", "context"]))





### Let's define a simple prompt to generate responses

In [5]:
from transformers import RobertaForQuestionAnswering, RobertaTokenizer
import torch


def get_response(row, model_name, max_seq_length=512):
    question = row['question'][0]
    context = row['context'][0]

    # Load the question answering model and tokenizer
    model = RobertaForQuestionAnswering.from_pretrained(model_name)
    tokenizer = RobertaTokenizer.from_pretrained(model_name)

    # Tokenize the input text
    inputs = tokenizer(question, context, return_tensors="pt", max_length=max_seq_length, truncation=True)

    # Check if the input exceeds the maximum sequence length
    if inputs['input_ids'].size(1) > max_seq_length:
        # Handle the case of a longer sequence by truncating or splitting
        # You may choose to truncate or implement your logic for handling longer sequences

        # Example: Truncate the input sequence
        inputs['input_ids'] = inputs['input_ids'][:, :max_seq_length]
        inputs['attention_mask'] = inputs['attention_mask'][:, :max_seq_length]

    # Perform inference with the question answering model
    outputs = model(**inputs)

    # Get the start and end scores from the output
    start_scores = outputs.start_logits
    end_scores = outputs.end_logits

    # Get the answer span from the start and end scores
    answer_start = torch.argmax(start_scores, dim=1).item()
    answer_end = torch.argmax(end_scores, dim=1).item() + 1

    # Decode the answer from the tokens
    answer = tokenizer.decode(inputs["input_ids"][0][answer_start:answer_end])



    return {
        'question': question,
        'context':context,
        'response': answer,
        'ground_truth': row['ground_truth'][0]
    }

### Generate responses for both the models

Here is the response from "lighter" compressed model. This is distilled version of the roberta-base



In [8]:

# knowledge distilled model
results = [get_response(filtered_dataset[idx], 'deepset/tinyroberta-squad2') for idx in range(len(filtered_dataset))]

print(results)


Question: What happen in this selling call option scenario
Answer:  leaves you with no position

Question: Appropriate model for deferred costs as a line-of-credit
Answer: <s>

Question: Dow Jones Industrial Average (DJIA), NASDAQ 100, and S&P 500 index historical membership listing?
Answer: <s>

Question: What is meant by the term “representative stock list” here?
Answer:  a list of stocks that would reasonably be expected to have about the same results as the whole market

Question: Balance sheet, Net Increase
Answer: <s>

Question: Cheapest USD to GBP transfer
Answer: <s>

Question: How to move (or not move) an LLC from Illinois to New Mexico?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: When a Company was expected and then made a profit of X $ then that X$ increased it's share price. or those the Sellers and Buyers [duplicate]
Answer: <s>

Question: Investing money 101
Answer: <s>

Question: Determining the minimum dividend that should be paid from my S corporation
Answer: <s>

Question: Can There Be Partial Trade Fill Percentage?
Answer: <s>

Question: How does a stock operate when it is listed between two exchanges?
Answer: <s>

Question: Is there a way to tell how many stocks have been shorted?
Answer: There is no way to know anything about who has shorted stuff

Question: Double-Taxation of Royalties paid for in Korea to a US Company
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: What scrutiny to expect if making large purchase with physical cash? [duplicate]
Answer: <s>

Question: Would extending my mortgage cause the terms to be re-negotiated?
Answer:  Your re-negotiating position is limited

Question: Accounting for splits in a stock price graph
Answer: <s>

Question: Is it acceptable to receive payment from U.S. in Indian saving bank account via PayPal?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: How to prevent misusing my Account details
Answer: <s>

Question: What should I do with a savings account in another country?
Answer: <s>

Question: Can stockholders choose NOT to elect a board of directors?
Answer: <s>

Question: How do credit card banks detect fraudulent transactions without requiring a travel advisory?
Answer: <s>

Question: If I get cash compensation for my stocks (following a merger for example) does that qualify for capital gains tax?
Answer:  it would be reportable income

Question: What is the 'real' monthly cost of a car?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: Strange values in ARM.L price data 1998-2000 from Yahoo
Answer: <s>

Question: How can I report pump and dump scams?
Answer: <s>

Question: The formula equivalent of EBITDA for personal finance?
Answer: <s>

Question: Stock Dividends & Splits: Are they always applied over night?
Answer: <s>

Question: how stock market sale work?
Answer:  It depends on total number of shares in market and total turn over for that specific shares

Question: Shares in stock exchange and dividend payout relationship
Answer:  ex-dividend on or after the ex-dividend date

Question: How to find SEC filings that are important to stock market
Answer:  Securities Exchange Commission's EDGAR search engine

Question: Compute average price even if I do not have the prices before
Answer: <s>

Question: Where can I find out details about the actual network on which SWIFT banking works?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: what is the timezone that yahoo uses for stock information
Answer:  native

Question: Are there limits on frequency of withdrawal from Roth 401K?
Answer: <s>Are there limits on frequency of withdrawal from Roth 401K?</s></s>Just the amount contributed to the Roth 401k that you rolled over, not the conversions from regular 401k/traditional IRA (for those there are holding period limitation of 5 year from conversion), the earning on it or the employer's match (neither of these can be withdrawn without penalty as a non-qualified withdrawal). However, I'd suggest not to withdraw from Roth IRA unless you're sleeping on a bench in a park and beg strangers for a piece of bread. This is the best retirement investment you can make while you're in the lower tax brackets, and withdrawing it would reduce dramatically your tax-free retirement income.
If one separates from work at 55 or older, they can withdraw from that 401(k) with no penalty.  You might wish to consider a mix of Roth IRA

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: Who owns NASDAQ? Does it collect fees from stock transactions?
Answer: NASDAQ OMX Group

Question: In what circumstances will a bank waive the annual credit card fee?
Answer:  if you have a certain minimum balance, use direct deposit, or something similar

Question: How can I understand why investors think a particular company should have a high PE ratio?
Answer: <s>

Question: Credit History and Outstanding Debts in Hungary
Answer: <s>

Question: T-mobile stock: difference between TMUSP vs TMUS
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: Why was S&P 500 PE Ratio so high on May 2009
Answer: <s>



And here is the response from "larger" non compressed model

In [9]:

results_large = [get_response(filtered_dataset[idx], 'deepset/roberta-base-squad2') for idx in range(len(filtered_dataset))]

print(results)

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Question: What happen in this selling call option scenario
Answer:  selling the call you bought. That leaves you with no position

Question: Appropriate model for deferred costs as a line-of-credit
Answer: <s>

Question: Dow Jones Industrial Average (DJIA), NASDAQ 100, and S&P 500 index historical membership listing?
Answer: <s>

Question: What is meant by the term “representative stock list” here?
Answer:  a list of stocks that would reasonably be expected to have about the same results as the whole market, i.e. be representative of an investment that invests in all those stocks

Question: Balance sheet, Net Increase
Answer: <s>

Question: Cheapest USD to GBP transfer
Answer: <s>

Question: How to move (or not move) an LLC from Illinois to New Mexico?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: When a Company was expected and then made a profit of X $ then that X$ increased it's share price. or those the Sellers and Buyers [duplicate]
Answer: <s>

Question: Investing money 101
Answer: <s>

Question: Determining the minimum dividend that should be paid from my S corporation
Answer: <s>

Question: Can There Be Partial Trade Fill Percentage?
Answer:  Most exchanges support partial fills

Question: How does a stock operate when it is listed between two exchanges?
Answer: <s>

Question: Is there a way to tell how many stocks have been shorted?
Answer: There is no way to know anything about who has shorted stuff or how concentrated the positions are in a few investors

Question: Double-Taxation of Royalties paid for in Korea to a US Company
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: What scrutiny to expect if making large purchase with physical cash? [duplicate]
Answer: <s>What scrutiny to expect if making large purchase with physical cash? [duplicate]</s></s>http://www.consumerismcommentary.com/buying-house-with-cash/ It looks like you can, but it's a bad idea because you lack protection of a receipt, there's no record of you actually giving the money over, and the money would need to be counted - bill by bill - which increases time and likelihood of error.   In general, paying large amounts in cash won't bring up any scrutiny

Question: Would extending my mortgage cause the terms to be re-negotiated?
Answer:  Your re-negotiating position is limited

Question: Accounting for splits in a stock price graph
Answer: <s>

Question: Is it acceptable to receive payment from U.S. in Indian saving bank account via PayPal?
Answer:  I am not sure if PayPal would transfer money to these accounts or would convert



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: How to prevent misusing my Account details
Answer:  unless you are sure of the person

Question: What should I do with a savings account in another country?
Answer: <s>

Question: Can stockholders choose NOT to elect a board of directors?
Answer: <s>

Question: How do credit card banks detect fraudulent transactions without requiring a travel advisory?
Answer: <s>

Question: If I get cash compensation for my stocks (following a merger for example) does that qualify for capital gains tax?
Answer: <s>If I get cash compensation for my stocks (following a merger for example) does that qualify for capital gains tax?</s></s>does it still count as a capital gain or loss? Yes. Is it essentially treated like you sold the stock at the price of the buy-out?  Yes. Do you still get a 1099-B from your broker? Yes.
Yes (most likely). If you are exchanging investments for cash, you will have to pay tax on that - disregarding capital losses, capital loss carryovers, AGI thresholds, and other 

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: Strange values in ARM.L price data 1998-2000 from Yahoo
Answer: <s>

Question: How can I report pump and dump scams?
Answer: <s>

Question: The formula equivalent of EBITDA for personal finance?
Answer: <s>

Question: Stock Dividends & Splits: Are they always applied over night?
Answer: <s>

Question: how stock market sale work?
Answer:  the price will move down

Question: Shares in stock exchange and dividend payout relationship
Answer: <s>

Question: How to find SEC filings that are important to stock market
Answer:  Securities Exchange Commission's EDGAR search engine

Question: Compute average price even if I do not have the prices before
Answer: <s>

Question: Where can I find out details about the actual network on which SWIFT banking works?
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: what is the timezone that yahoo uses for stock information
Answer: 

Question: Are there limits on frequency of withdrawal from Roth 401K?
Answer: <s>Are there limits on frequency of withdrawal from Roth 401K?</s></s>Just the amount contributed to the Roth 401k that you rolled over, not the conversions from regular 401k/traditional IRA (for those there are holding period limitation of 5 year from conversion

Question: Virtual Terminal WITHOUT merchant account?
Answer: <s>

Question: Can an IRA be taxed?
Answer:  You're not taxed on this dividend



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: Who owns NASDAQ? Does it collect fees from stock transactions?
Answer: NASDAQ OMX Group

Question: In what circumstances will a bank waive the annual credit card fee?
Answer:  if you have a certain minimum balance, use direct deposit, or something similar

Question: How can I understand why investors think a particular company should have a high PE ratio?
Answer:  faster the market believes a company will grow

Question: Credit History and Outstanding Debts in Hungary
Answer: <s>

Question: T-mobile stock: difference between TMUSP vs TMUS
Answer: <s>



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.


Question: Why was S&P 500 PE Ratio so high on May 2009
Answer: <s>



### Evaluating Experiments using UpTrain

UpTrain's EvalLLM provides an "evaluate_experiments" method which takes the input data to be evaluated along with the list of checks to be run and the name of the columns associated with the experiment. In this example, we are generating responses in the notebook itself but you can imagine generating responses via your own setup and just passing the query-response pair to UpTrain for evaluation

In [10]:
from uptrain import EvalLLM, Evals, APIClient , Settings
import json

data = results + results_large
print(data)



In [11]:
UPTRAIN_API_KEY = "up-**************************"  # Insert your UpTrain API key here

uptrain_client = APIClient(
    Settings(
        uptrain_access_token=UPTRAIN_API_KEY, response_format={"type": "json_object"}
    )
)

res = uptrain_client.log_and_evaluate(
    "LLM compression", data, [Evals.FACTUAL_ACCURACY,Evals.RESPONSE_COMPLETENESS_WRT_CONTEXT ]
)

print(json.dumps(res, indent=3))

[32m2024-03-14 16:44:59.099[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m669[0m - [1mSending evaluation request for rows 0 to <50 to the Uptrain server[0m
[32m2024-03-14 16:46:53.875[0m | [1mINFO    [0m | [36muptrain.framework.remote[0m:[36mlog_and_evaluate[0m:[36m669[0m - [1mSending evaluation request for rows 50 to <100 to the Uptrain server[0m


[
   {
      "question": "What happen in this selling call option scenario",
      "context": "If you sold bought a call option then as you stated sold it to someone else what you are doing is selling the call you bought. That leaves you with no position.  This is the case if you are talking about the same strike, same expiration.\nAn expired option is a stand-alone event, sold at $X, with a bought at $0 on the expiration date.  The way you phrased the question is ambiguous, as 'decrease toward zero' is not quite the same as expiring worthless, you'd need to buy it at the near-zero price to then sell another covered call at a lower strike.  Edit - If you entered the covered call sale properly, you find that an in-the-money option results in a sale of the shares at expiration. When entered incorrectly, there are two possibilities, the broker buys the option back at the market close, or you wake up Sunday morning (the options 'paperwork' clears on Saturday after expiration) finding yours

**Access UpTrain Dashboards**: We can access the evaluation results at https://demo.uptrain.ai/dashboard/ - the same API key can be used to access the dashboards.

