<a href="https://colab.research.google.com/github/uptrain-ai/uptrain/blob/main/examples/experiments/llm_compression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align="center">
  <a href="https://uptrain.ai">
    <img width="300" src="https://user-images.githubusercontent.com/108270398/214240695-4f958b76-c993-4ddd-8de6-8668f4d0da84.png" alt="uptrain">
  </a>
</h1>

## Experimenting with Compressed and non-compressed

**Overview**: In this notebook, we will compare same mode, but compressed and non compressed to measure the hit in accuracy due to compression. We will be using around 30 randomly picked examples from the Financial QA dataset and evaluate the response on different criteria to determine which of the two models performs better.




In [22]:
!pip install openai uptrain -q

### Download the testing dataset

Note: Ground Truth is optional as UpTrain supports many checks (like factual accuracy, response relevance, etc. which doesn't require ground truth)

In [23]:
import polars as pl
import os

url = "https://uptrain-assets.s3.ap-south-1.amazonaws.com/data/evaluations_dataset.jsonl"
dataset_path = os.path.join('./', "benchmark.jsonl")

if not os.path.exists(dataset_path):
    import httpx
    r = httpx.get(url)
    with open(dataset_path, "wb") as f:
        f.write(r.content)

dataset = pl.read_ndjson(dataset_path).select(pl.col(["question", "ground_truth", "context"]))
print(dataset)

shape: (43, 3)
┌────────────────────────────────┬────────────────────────────────┬────────────────────────────────┐
│ question                       ┆ ground_truth                   ┆ context                        │
│ ---                            ┆ ---                            ┆ ---                            │
│ str                            ┆ str                            ┆ str                            │
╞════════════════════════════════╪════════════════════════════════╪════════════════════════════════╡
│ What happen in this selling    ┆ "But what happen if the stock  ┆ If you sold bought a call      │
│ call…                          ┆ pr…                            ┆ option…                        │
│ Appropriate model for deferred ┆ There's no standard formula.   ┆ I would recommend that you     │
│ c…                             ┆ You…                           ┆ take …                         │
│ Dow Jones Industrial Average   ┆ Dow Jones:                     ┆ Dow Jone

### Let's define a simple prompt to generate responses

In [24]:
!pip install loguru
from transformers import RobertaForQuestionAnswering, RobertaTokenizer
import torch
from loguru import logger
import nest_asyncio
nest_asyncio.apply()

def get_response(row, model_name, max_seq_length=512):
    question = row['question'][0]
    context = row['context'][0]

    # Load the question answering model and tokenizer
    model = RobertaForQuestionAnswering.from_pretrained(model_name)
    tokenizer = RobertaTokenizer.from_pretrained(model_name)

    # Tokenize the input text
    inputs = tokenizer(question, context, return_tensors="pt", max_length=max_seq_length, truncation=True)

    # Check if the input exceeds the maximum sequence length
    if inputs['input_ids'].size(1) > max_seq_length:
        # Handle the case of a longer sequence by truncating or splitting
        # You may choose to truncate or implement your logic for handling longer sequences

        # Example: Truncate the input sequence
        inputs['input_ids'] = inputs['input_ids'][:, :max_seq_length]
        inputs['attention_mask'] = inputs['attention_mask'][:, :max_seq_length]

    # Perform inference with the question answering model
    outputs = model(**inputs)

    # Get the start and end scores from the output
    start_scores = outputs.start_logits
    end_scores = outputs.end_logits

    # Get the answer span from the start and end scores
    answer_start = torch.argmax(start_scores, dim=1).item()
    answer_end = torch.argmax(end_scores, dim=1).item() + 1

    # Decode the answer from the tokens
    answer = tokenizer.decode(inputs["input_ids"][0][answer_start:answer_end])

    # Log the answer
    logger.info(f"Answer: {answer}")

    return {
        'question': question,
        'context': context,
        'response': answer,
        'ground_truth': row['ground_truth'][0]
    }



### Generate responses for both the models

Here is the response from "lighter" compressed model. This is distilled version of the roberta-base



In [25]:

results = [get_response(dataset[idx], 'deepset/tinyroberta-squad2') for idx in range(len(dataset))]

print(results)

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pai



And here is the response from "larger" non compressed model

In [26]:

results = [get_response(dataset[idx], 'deepset/roberta-base-squad2') for idx in range(len(dataset))]

print(results)

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.
Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pai



### Evaluating Experiments using UpTrain

UpTrain's EvalLLM provides an "evaluate_experiments" method which takes the input data to be evaluated along with the list of checks to be run and the name of the columns associated with the experiment. In this example, we are generating responses in the notebook itself but you can imagine generating responses via your own setup and just passing the query-response pair to UpTrain for evaluation

In [27]:
from uptrain import EvalLLM, Evals, APIClient , Settings
import json

data = results + results_large


UPTRAIN_API_KEY = "up-**************************"  # Insert your UpTrain API key here

uptrain_client = APIClient(
    Settings(
        uptrain_access_token=UPTRAIN_API_KEY, response_format={"type": "json_object"}
    )
)

res = uptrain_client.log_and_evaluate(
    "LLM compression", data, [Evals.FACTUAL_ACCURACY,Evals.RESPONSE_COMPLETENESS_WRT_CONTEXT ]
)

print(json.dumps(res, indent=3))

[
   {
      "question": "What happen in this selling call option scenario",
      "context": "If you sold bought a call option then as you stated sold it to someone else what you are doing is selling the call you bought. That leaves you with no position.  This is the case if you are talking about the same strike, same expiration.\nAn expired option is a stand-alone event, sold at $X, with a bought at $0 on the expiration date.  The way you phrased the question is ambiguous, as 'decrease toward zero' is not quite the same as expiring worthless, you'd need to buy it at the near-zero price to then sell another covered call at a lower strike.  Edit - If you entered the covered call sale properly, you find that an in-the-money option results in a sale of the shares at expiration. When entered incorrectly, there are two possibilities, the broker buys the option back at the market close, or you wake up Sunday morning (the options 'paperwork' clears on Saturday after expiration) finding yours

**Access UpTrain Dashboards**: We can access the evaluation results at https://demo.uptrain.ai/dashboard/ - the same API key can be used to access the dashboards.

