<a href="https://colab.research.google.com/github/uptrain-ai/uptrain/blob/main/examples/experiments/llm_compression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align="center">
  <a href="https://uptrain.ai">
    <img width="300" src="https://user-images.githubusercontent.com/108270398/214240695-4f958b76-c993-4ddd-8de6-8668f4d0da84.png" alt="uptrain">
  </a>
</h1>

## Experimenting with Compressed and non-compressed

**Overview**: In this notebook, we will compare same mode, but compressed and non compressed to measure the hit in accuracy due to compression. We will be using around 30 randomly picked examples from the Financial QA dataset and evaluate the response on different criteria to determine which of the two models performs better.




In [1]:
!pip install openai uptrain -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/257.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━[0m [32m112.6/257.5 kB[0m [31m3.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m257.5/257.5 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m159.6/159.6 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.8/77.8 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25h

### Download the testing dataset

Note: Ground Truth is optional as UpTrain supports many checks (like factual accuracy, response relevance, etc. which doesn't require ground truth)

In [8]:
import polars as pl
import os
import re

url = "https://uptrain-assets.s3.ap-south-1.amazonaws.com/data/evaluations_dataset.jsonl"
dataset_path = os.path.join('./', "benchmark.jsonl")

if not os.path.exists(dataset_path):
    import httpx
    r = httpx.get(url)
    with open(dataset_path, "wb") as f:
        f.write(r.content)

filtered_dataset = pl.read_ndjson(dataset_path).select(pl.col(["question", "ground_truth", "context"]))





Filtered dataset saved successfully.
shape: (4, 3)
┌────────────────────────────────┬────────────────────────────────┬────────────────────────────────┐
│ question                       ┆ ground_truth                   ┆ context                        │
│ ---                            ┆ ---                            ┆ ---                            │
│ str                            ┆ str                            ┆ str                            │
╞════════════════════════════════╪════════════════════════════════╪════════════════════════════════╡
│ How does a stock operate when  ┆ Say a stock is listed in       ┆ Say a stock is listed in       │
│ it…                            ┆ Nasdaq,…                       ┆ Nasdaq,…                       │
│ How do credit card banks       ┆ One bank is more willing to    ┆ "Having worked in the          │
│ detect …                       ┆ risk…                          ┆ financial …                    │
│ What is the 'real' monthly     ┆ How c

### Let's define a simple prompt to generate responses

In [25]:
from transformers import RobertaForQuestionAnswering, RobertaTokenizer
import torch


def get_response(row, model_name, max_seq_length=512):
    question = row['question'][0]
    context = row['context'][0]

    # Load the question answering model and tokenizer
    model = RobertaForQuestionAnswering.from_pretrained(model_name)
    tokenizer = RobertaTokenizer.from_pretrained(model_name)

    # Tokenize the input text
    inputs = tokenizer(question, context, return_tensors="pt", max_length=max_seq_length, truncation=True)

    # Check if the input exceeds the maximum sequence length
    if inputs['input_ids'].size(1) > max_seq_length:
        # Handle the case of a longer sequence by truncating or splitting
        # You may choose to truncate or implement your logic for handling longer sequences

        # Example: Truncate the input sequence
        inputs['input_ids'] = inputs['input_ids'][:, :max_seq_length]
        inputs['attention_mask'] = inputs['attention_mask'][:, :max_seq_length]

    # Perform inference with the question answering model
    outputs = model(**inputs)

    # Get the start and end scores from the output
    start_scores = outputs.start_logits
    end_scores = outputs.end_logits

    # Get the answer span from the start and end scores
    answer_start = torch.argmax(start_scores, dim=1).item()
    answer_end = torch.argmax(end_scores, dim=1).item() + 1

    # Decode the answer from the tokens
    answer = tokenizer.decode(inputs["input_ids"][0][answer_start:answer_end])

    # Log the answer
    logger.info(f"Answer: {answer}")

    return {
        'question': question,
        'context':context,
        'response': answer,
        'ground_truth': row['ground_truth'][0]
    }



### Generate responses for both the models

Here is the response from "lighter" compressed model. This is distilled version of the roberta-base



In [26]:

results = [get_response(filtered_dataset[idx], 'deepset/tinyroberta-squad2') for idx in range(len(filtered_dataset))]

print(results)



Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.




And here is the response from "larger" non compressed model

In [27]:

results_large = [get_response(filtered_dataset[idx], 'deepset/roberta-base-squad2') for idx in range(len(filtered_dataset))]

print(results)

Be aware, overflowing tokens are not returned for the setting you have chosen, i.e. sequence pairs with the 'longest_first' truncation strategy. So the returned list will always be empty even if some tokens have been removed.




### Evaluating Experiments using UpTrain

UpTrain's EvalLLM provides an "evaluate_experiments" method which takes the input data to be evaluated along with the list of checks to be run and the name of the columns associated with the experiment. In this example, we are generating responses in the notebook itself but you can imagine generating responses via your own setup and just passing the query-response pair to UpTrain for evaluation

In [28]:
from uptrain import EvalLLM, Evals, APIClient , Settings
import json

data = results + results_large

print(data)





In [29]:
UPTRAIN_API_KEY = "up-*******************************"  # Insert your UpTrain API key here

uptrain_client = APIClient(
    Settings(
        uptrain_access_token=UPTRAIN_API_KEY, response_format={"type": "json_object"}
    )
)

res = uptrain_client.log_and_evaluate(
    "LLM compression", data, [Evals.FACTUAL_ACCURACY,Evals.RESPONSE_COMPLETENESS_WRT_CONTEXT ]
)

print(json.dumps(res, indent=3))

[
   {
      "question": "How does a stock operate when it is listed between two exchanges?",
      "context": "Say a stock is listed in Nasdaq, and the same company has a stock listed in Tsx. Does the Nasdaq price affect the Tsx price as trading commences? Not directly. Basically, an exchange is a market, and the price is defined only by supply and demand in that market. However, any substantial price differential for a commodity traded in multiple market creates an arbitrage opportunity, and there are many traders whose job it is exactly to find and use such opportunities. Their activity in turn has the effect of reducing the price differentials to the point where transaction costs make them unprofitable. With high-frequency traders around, the time for a price differential to disappear is nowadays measured in milliseconds. If a trader buys from one exchange, will it affect the price of the other? Only through the mechanism mentioned above.  Are there any benefits to being listed in 

**Access UpTrain Dashboards**: We can access the evaluation results at https://demo.uptrain.ai/dashboard/ - the same API key can be used to access the dashboards.

