# Goal

The goal of this tutorial is to demonstrate how to preprocess a small corpus of financial reports using LlamaIndex to prepare sentence-level nodes and generate synthetic question-answer (QA) pairs. These pairs can then be used to fine-tune an embedding model for improved semantic search and retrieval performance. Specifically, you’ll learn how to:
	•	Download and parse PDF documents into sentence-level “nodes”
	•	Generate synthetic QA pairs using an LLM (like GPT-3.5)
	•	Structure the output into a training-ready dataset for fine-tuning

This forms the foundational step in building better question-answering systems using retrieval-augmented generation (RAG) or fine-tuning embeddings.

# 1. Download dataset

This code sets up a small document corpus with one training and one validation PDF, and processes each into sentence-level “nodes” using LlamaIndex utilities. These nodes can then be used for tasks like:
- Embedding and similarity search
- RAG (Retrieval-Augmented Generation)
- Fine-tuning LLMs with sentence-level data

## 1.1 Imports

- SimpleDirectoryReader: A LlamaIndex utility to load documents from a directory.
- SentenceSplitter: Breaks documents into smaller chunks, typically sentence-level.

In [1]:
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

## 1.2. Creating Directories and Downloading PDFs

- Creates train and val directories under data/10k/.
- Downloads Uber 10-K (2021) and Lyft 10-K (2021) PDFs from LlamaIndex GitHub repo.
     - uber_2021.pdf → val
     - lyft_2021.pdf → train

In [2]:
!mkdir -p 'data/10k/train'
!mkdir -p 'data/10k/val'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/val/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/train/lyft_2021.pdf'

--2025-04-10 23:40:28--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf
185.199.109.133, 185.199.110.133, 185.199.111.133, ...tent.com)... 
connected. to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... 
HTTP request sent, awaiting response... 200 OK
Length: 1880483 (1.8M) [application/octet-stream]
Saving to: ‘data/10k/val/uber_2021.pdf’


2025-04-10 23:40:29 (3.28 MB/s) - ‘data/10k/val/uber_2021.pdf’ saved [1880483/1880483]

--2025-04-10 23:40:29--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf
185.199.109.133, 185.199.110.133, 185.199.111.133, ...tent.com)... 
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1440303 (1.4M) [application/octet-stream]
Saving to: ‘data/10k/train/lyft_2021.pdf’


2025-04-10 23:40:30 (3.15 MB/s) - ‘data/10k/

In [3]:
TRAIN_DIR = "./data/10k/train"
VAL_DIR = "./data/10k/val"

## 1.3. Define the load_corpus() function

1.	Loads all .pdf documents from the given directory using SimpleDirectoryReader.
2.	Splits the documents into smaller units called nodes, typically one per sentence or paragraph, using SentenceSplitter.
3.	Returns the list of nodes.

Optional verbose flag: If set to True, prints the number of documents and nodes processed.

In [4]:
def load_corpus(directory, verbose=False):

    reader = SimpleDirectoryReader(input_dir=directory,
    required_exts=[".pdf"],
    recursive=True)
    docs = reader.load_data()
    if verbose:
        print(f"Loaded {len(docs)} docs")

    parser = SentenceSplitter()
    nodes = parser.get_nodes_from_documents(docs, show_progress=verbose)

    if verbose:
        print(f"Parsed {len(nodes)} nodes")

    return nodes

## 1.4. Create the training and validation nodes. 

- Loads and parses lyft_2021.pdf into train_nodes (total 345 nodes)
- Loads and parses uber_2021.pdf into val_nodes (total 407 nodes)

In [5]:
train_nodes = load_corpus(TRAIN_DIR, verbose=True)
val_nodes = load_corpus(VAL_DIR, verbose=True)

Loaded 238 docs


Parsing nodes:   0%|          | 0/238 [00:00<?, ?it/s]

Parsed 345 nodes
Loaded 307 docs


Parsing nodes:   0%|          | 0/307 [00:00<?, ?it/s]

Parsed 407 nodes


## 1.5. Display a sample Node

Following is an example of the first node from the training data

In [6]:
train_nodes[0].to_json()

'{"id_": "d72149af-72f4-4ffe-bcba-479381f4673a", "embedding": null, "metadata": {"page_label": "1", "file_name": "lyft_2021.pdf", "file_path": "/Users/tuhinsharma/Documents/Git/genai-playground/fine-tune/vanilla-embedding/data/10k/train/lyft_2021.pdf", "file_type": "application/pdf", "file_size": 1440303, "creation_date": "2025-04-10", "last_modified_date": "2025-04-10"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "abe43533-c71a-4081-b229-14f3301b057e", "node_type": "4", "metadata": {"page_label": "1", "file_name": "lyft_2021.pdf", "file_path": "/Users/tuhinsharma/Documents/Git/genai-playground/fine-tune/vanilla-embedding/data/10k/train/lyft_2021.pdf", "file_type": "application/pdf", "file_size": 1440303, "creation_date": "2025-04-1

# 2. Generate synthetic queries and prepare training dataset

This code is typically used as a preprocessing step in embedding model fine-tuning—where you’re training a vector embedding model to better understand and encode semantic relationships between questions and their corresponding answers. This code snippet is part of a workflow using LlamaIndex to prepare a dataset for fine-tuning an embedding model for a question-answering (QA) task. Here’s a breakdown of what each part of the code does:

## 2.1. Imports

- generate_qa_embedding_pairs: Generates QA pairs from a set of documents (called nodes) using an LLM and stores them in a format useful for fine-tuning an embedding model.
- EmbeddingQAFinetuneDataset: Utility class to load the generated QA pair dataset.
- OpenAI: Wrapper around OpenAI models (like GPT-3.5) for use in LlamaIndex.

In [7]:
from llama_index.finetuning import generate_qa_embedding_pairs
from llama_index.core.evaluation import EmbeddingQAFinetuneDataset

## 2.2. Generate QA Pairs for Training and Validation

- Takes train_nodes and val_nodes (documents or chunks of text) as input.
- Uses GPT-3.5 to generate question-answer pairs from each node.
- Saves the output to "train_dataset.json" and "val_dataset.json".

You can choose to comment out the following block as it will require you to have openai credit and it shall take about 15 mins of time. `train_dataset.json` and `val_dataset.json` are already pushed to the code repository to save you the effort. 

In [8]:
# from llama_index.llms.openai import OpenAI


# train_dataset = generate_qa_embedding_pairs(
#     llm=OpenAI(model="gpt-3.5-turbo"),
#     nodes=train_nodes,
#     output_path="train_dataset.json",
# )
# val_dataset = generate_qa_embedding_pairs(
#     llm=OpenAI(model="gpt-3.5-turbo"),
#     nodes=val_nodes,
#     output_path="val_dataset.json",
# )

## 2.3. Load the Datasets

Loads the generated JSON files and returns structured objects that can be used later for fine-tuning the embedding model.

In [9]:
train_dataset = EmbeddingQAFinetuneDataset.from_json("train_dataset.json")
val_dataset = EmbeddingQAFinetuneDataset.from_json("val_dataset.json")

## 2.4 Display a sample training record

Following is an example of the first node from the training data. 3 components from the dataset:
 - corpus: Dictionary of documents
 - queries: Dictionary of search queries
 - relevant_docs: Mapping of which documents are relevant to which queries

In [10]:
print(next(iter(train_dataset.queries.items())))
print(next(iter(train_dataset.relevant_docs.items())))
print(next(iter(train_dataset.corpus.items())))

('e5c751eb-26ca-4f05-b92f-6a701e59f631', "What is the market value of Lyft, Inc.'s common stock held by non-affiliates as of June 30, 2021, and how was this value determined?")
('e5c751eb-26ca-4f05-b92f-6a701e59f631', ['07e36631-d1de-4dc7-b055-927731dc8c18'])
('07e36631-d1de-4dc7-b055-927731dc8c18', 'UNITED STATES\nSECURITIES AND EXCHANGE COMMISSIONWashington, D.C. 20549 FORM 10-K (Mark One)☒ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934For the fiscal year ended December 31, 2021 OR ☐TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 FOR THE TRANSITION PERIODFROM                      TOCommission File Number 001-38846 Lyft, Inc.(Exact name of Registrant as specified in its Charter)Delaware20-8809830 (State or other jurisdiction ofincorporation or organization)(I.R.S. Employer Identification No.)185 Berry Street, Suite 5000San Francisco, California94107 (Address of principal executive offices)(Zip Code) Registra

# 3. Evaluation metrics

## 3.1 Imports

- Imports an evaluator that tests how well a model retrieves relevant documents for given queries.
- Imports the model class to load a pre-trained SentenceTransformer.
- Used to create output directories if they don’t already exist.

In [11]:
from sentence_transformers.evaluation import InformationRetrievalEvaluator
from sentence_transformers import SentenceTransformer
from pathlib import Path

INFO:datasets:PyTorch version 2.6.0 available.
PyTorch version 2.6.0 available.


## 3.2 Function: evaluate_st

This function takes:
- dataset: An object that must contain .corpus, .queries, and .relevant_docs.
- model_id: The model name or path to load from SentenceTransformer.
- name: A name identifier for the evaluation report.Extracts the required components from the dataset:

In [12]:
def evaluate_st(
    dataset,
    model_id,
    name,
):  
    corpus = dataset.corpus
    queries = dataset.queries
    relevant_docs = dataset.relevant_docs

    evaluator = InformationRetrievalEvaluator(
        queries, corpus, relevant_docs, name=name
    )
    model = SentenceTransformer(model_id)
    output_path = "results/"
    Path(output_path).mkdir(exist_ok=True, parents=True)
    return evaluator(model, output_path=output_path)

# 4. Finetune a public huggingface embedding model using SentenceTransformers

## 4.1 Define the model id (Hugging Face model id)

We shall fine-tune the opensource model `mixedbread-ai/mxbai-embed-xsmall-v1`

In [13]:
model_id = "mixedbread-ai/mxbai-embed-xsmall-v1"
finetuned_model_id = "fine-tuned-mxbai-embed-xsmall-v1"

## 4.2 Start the Fine-tuning process

The code fine-tunes a pre-trained Sentence Transformer model (e.g., one that converts sentences into embeddings) using your own training and validation datasets, saving the updated model to a specified location after 5 training epochs.

Here’s what each argument means:
 - train_dataset: The training data - triplet data.
 - model_id: The base model to fine-tune (e.g., "sentence-transformers/all-MiniLM-L6-v2").
 - model_output_path: Where to save the fine-tuned model after training (finetuned_model_id is a variable holding that path).
 - val_dataset: The validation dataset used to monitor performance during training.
 - epochs=5: The number of passes over the training data.
 - use_all_docs=True: If set to True, it may use all documents in the dataset rather than filtering or sampling.

In [14]:
from llama_index.finetuning import SentenceTransformersFinetuneEngine
finetune_engine = SentenceTransformersFinetuneEngine(
    train_dataset,
    model_id=model_id,
    model_output_path=finetuned_model_id,
    val_dataset=val_dataset,
    epochs=5,
    use_all_docs=True
)
finetune_engine.finetune()

INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: mps
Use pytorch device_name: mps
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: mixedbread-ai/mxbai-embed-xsmall-v1
Load pretrained SentenceTransformer: mixedbread-ai/mxbai-embed-xsmall-v1


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss,Validation Loss,Cosine Accuracy@1,Cosine Accuracy@3,Cosine Accuracy@5,Cosine Accuracy@10,Cosine Precision@1,Cosine Precision@3,Cosine Precision@5,Cosine Precision@10,Cosine Recall@1,Cosine Recall@3,Cosine Recall@5,Cosine Recall@10,Cosine Ndcg@10,Cosine Mrr@10,Cosine Map@100
50,No log,No log,0.538084,0.771499,0.837838,0.895577,0.538084,0.257166,0.167568,0.089558,0.538084,0.771499,0.837838,0.895577,0.722603,0.666528,0.670044
69,No log,No log,0.545455,0.762899,0.829238,0.89312,0.545455,0.2543,0.165848,0.089312,0.545455,0.762899,0.829238,0.89312,0.72446,0.669787,0.67368
100,No log,No log,0.560197,0.775184,0.831695,0.896806,0.560197,0.258395,0.166339,0.089681,0.560197,0.775184,0.831695,0.896806,0.733493,0.680677,0.684266
138,No log,No log,0.565111,0.77027,0.834152,0.894349,0.565111,0.256757,0.16683,0.089435,0.565111,0.77027,0.834152,0.894349,0.733484,0.681553,0.685234
150,No log,No log,0.566339,0.77027,0.834152,0.891892,0.566339,0.256757,0.16683,0.089189,0.566339,0.77027,0.834152,0.891892,0.734249,0.683176,0.687195
200,No log,No log,0.566339,0.780098,0.842752,0.896806,0.566339,0.260033,0.16855,0.089681,0.566339,0.780098,0.842752,0.896806,0.737364,0.685708,0.689692
207,No log,No log,0.566339,0.77887,0.840295,0.899263,0.566339,0.259623,0.168059,0.089926,0.566339,0.77887,0.840295,0.899263,0.738089,0.685955,0.689694
250,No log,No log,0.572482,0.782555,0.842752,0.895577,0.572482,0.260852,0.16855,0.089558,0.572482,0.782555,0.842752,0.895577,0.738781,0.688057,0.692054
276,No log,No log,0.568796,0.783784,0.841523,0.89312,0.568796,0.261261,0.168305,0.089312,0.568796,0.783784,0.841523,0.89312,0.736463,0.685608,0.689784
300,No log,No log,0.571253,0.786241,0.841523,0.89312,0.571253,0.26208,0.168305,0.089312,0.571253,0.786241,0.841523,0.89312,0.737726,0.687265,0.691447


INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Information Retrieval Evaluation of the model on the  dataset in epoch 0.7246376811594203 after 50 steps:
Information Retrieval Evaluation of the model on the  dataset in epoch 0.7246376811594203 after 50 steps:
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Queries: 814
Queries: 814
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Corpus: 407

Corpus: 407

INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Score-Function: cosine
Score-Function: cosine
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@1: 53.81%
Accuracy@1: 53.81%
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@3: 77.15%
Accuracy@3: 77.15%
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@5: 83.78%
Accuracy@5: 83.78%
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@10: 89.56%
Accuracy@10:

## 4.3 Run Evals

Run the Eval pipeline.

In [15]:
import pandas as pd

evaluate_st(val_dataset, model_id, "mxbai")
evaluate_st(val_dataset, finetuned_model_id, "finetuned_mxbai")

df_st_bge = pd.read_csv(
    "results/Information-Retrieval_evaluation_mxbai_results.csv"
)
df_st_finetuned = pd.read_csv(
    "results/Information-Retrieval_evaluation_finetuned_mxbai_results.csv"
)

df_st_bge["model"] = "mxbai"
df_st_finetuned["model"] = "finetuned_mxbai"
df_st_all = pd.concat([df_st_bge, df_st_finetuned])
df_st_all = df_st_all.set_index("model")
df_st_all

INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: mps
Use pytorch device_name: mps
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: mixedbread-ai/mxbai-embed-xsmall-v1
Load pretrained SentenceTransformer: mixedbread-ai/mxbai-embed-xsmall-v1
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Information Retrieval Evaluation of the model on the mxbai dataset:
Information Retrieval Evaluation of the model on the mxbai dataset:
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Queries: 814
Queries: 814
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Corpus: 407

Corpus: 407

INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Score-Function: cosine
Score-Function: cosine
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@1: 50.12%
Accuracy@1: 50.12%
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@3: 72.48%
Acc

Unnamed: 0_level_0,epoch,steps,cosine-Accuracy@1,cosine-Accuracy@3,cosine-Accuracy@5,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@3,cosine-Recall@3,cosine-Precision@5,cosine-Recall@5,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
mxbai,-1,-1,0.501229,0.724816,0.791155,0.857494,0.501229,0.501229,0.241605,0.724816,0.158231,0.791155,0.085749,0.857494,0.62666,0.68294,0.632034
finetuned_mxbai,-1,-1,0.572482,0.782555,0.842752,0.895577,0.572482,0.572482,0.260852,0.782555,0.16855,0.842752,0.089558,0.895577,0.688057,0.738781,0.692054


# 5. Finetune a proprietary IBM Slate embedding model using SentenceTransformers

## 5.1 Define location of the model artifacts

Here we are refering to the local folder which contains all the model artifacts for the `slate-30m-english-rtrvr` embedding model

In [16]:
model_id = "slate-30m-english-rtrvr"
finetuned_model_id = "fine-tuned-slate-30m-english-rtrvr"

## 5.2 Run the fine tuning process

Exactly same process as before

In [17]:
from llama_index.finetuning import SentenceTransformersFinetuneEngine
finetune_engine = SentenceTransformersFinetuneEngine(
    train_dataset,
    model_id=model_id,
    model_output_path=finetuned_model_id,
    val_dataset=val_dataset,
    epochs=5,
    use_all_docs=True
)
finetune_engine.finetune()

INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: mps
Use pytorch device_name: mps
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: slate-30m-english-rtrvr
Load pretrained SentenceTransformer: slate-30m-english-rtrvr


Computing widget examples:   0%|          | 0/1 [00:00<?, ?example/s]

Step,Training Loss,Validation Loss,Cosine Accuracy@1,Cosine Accuracy@3,Cosine Accuracy@5,Cosine Accuracy@10,Cosine Precision@1,Cosine Precision@3,Cosine Precision@5,Cosine Precision@10,Cosine Recall@1,Cosine Recall@3,Cosine Recall@5,Cosine Recall@10,Cosine Ndcg@10,Cosine Mrr@10,Cosine Map@100
50,No log,No log,0.485258,0.675676,0.748157,0.81941,0.485258,0.225225,0.149631,0.081941,0.485258,0.675676,0.748157,0.81941,0.649847,0.595666,0.602681
69,No log,No log,0.486486,0.686732,0.761671,0.842752,0.486486,0.228911,0.152334,0.084275,0.486486,0.686732,0.761671,0.842752,0.661518,0.603693,0.60955
100,No log,No log,0.498771,0.697789,0.767813,0.855037,0.498771,0.232596,0.153563,0.085504,0.498771,0.697789,0.767813,0.855037,0.672603,0.614581,0.619844
138,No log,No log,0.502457,0.697789,0.769042,0.85258,0.502457,0.232596,0.153808,0.085258,0.502457,0.697789,0.769042,0.85258,0.673792,0.616822,0.622418
150,No log,No log,0.5,0.701474,0.77027,0.850123,0.5,0.233825,0.154054,0.085012,0.5,0.701474,0.77027,0.850123,0.672805,0.616119,0.621955
200,No log,No log,0.502457,0.707617,0.775184,0.855037,0.502457,0.235872,0.155037,0.085504,0.502457,0.707617,0.775184,0.855037,0.675904,0.618734,0.624165
207,No log,No log,0.502457,0.706388,0.776413,0.855037,0.502457,0.235463,0.155283,0.085504,0.502457,0.706388,0.776413,0.855037,0.676093,0.618945,0.62437
250,No log,No log,0.509828,0.710074,0.781327,0.858722,0.509828,0.236691,0.156265,0.085872,0.509828,0.710074,0.781327,0.858722,0.681909,0.625368,0.630597
276,No log,No log,0.5086,0.706388,0.781327,0.858722,0.5086,0.235463,0.156265,0.085872,0.5086,0.706388,0.781327,0.858722,0.681602,0.625003,0.630216
300,No log,No log,0.5086,0.706388,0.781327,0.857494,0.5086,0.235463,0.156265,0.085749,0.5086,0.706388,0.781327,0.857494,0.681541,0.625252,0.630604


INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Information Retrieval Evaluation of the model on the  dataset in epoch 0.7246376811594203 after 50 steps:
Information Retrieval Evaluation of the model on the  dataset in epoch 0.7246376811594203 after 50 steps:
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Queries: 814
Queries: 814
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Corpus: 407

Corpus: 407

INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Score-Function: cosine
Score-Function: cosine
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@1: 48.53%
Accuracy@1: 48.53%
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@3: 67.57%
Accuracy@3: 67.57%
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@5: 74.82%
Accuracy@5: 74.82%
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@10: 81.94%
Accuracy@10:

## 5.3 Run Eval

Run the same Eval pipeline and compare

In [18]:
evaluate_st(val_dataset, model_id, "slate")
evaluate_st(val_dataset, finetuned_model_id, "finetuned_slate")

df_st_bge = pd.read_csv(
    "results/Information-Retrieval_evaluation_slate_results.csv"
)
df_st_finetuned = pd.read_csv(
    "results/Information-Retrieval_evaluation_finetuned_slate_results.csv"
)

df_st_bge["model"] = "slate"
df_st_finetuned["model"] = "finetuned_slate"
df_st_all = pd.concat([df_st_bge, df_st_finetuned])
df_st_all = df_st_all.set_index("model")
df_st_all

INFO:sentence_transformers.SentenceTransformer:Use pytorch device_name: mps
Use pytorch device_name: mps
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: slate-30m-english-rtrvr
Load pretrained SentenceTransformer: slate-30m-english-rtrvr
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Information Retrieval Evaluation of the model on the slate dataset:
Information Retrieval Evaluation of the model on the slate dataset:
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Queries: 814
Queries: 814
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Corpus: 407

Corpus: 407

INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Score-Function: cosine
Score-Function: cosine
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@1: 40.66%
Accuracy@1: 40.66%
INFO:sentence_transformers.evaluation.InformationRetrievalEvaluator:Accuracy@3: 60.57%
Accuracy@3: 60.57%
INFO:sen

Unnamed: 0_level_0,epoch,steps,cosine-Accuracy@1,cosine-Accuracy@3,cosine-Accuracy@5,cosine-Accuracy@10,cosine-Precision@1,cosine-Recall@1,cosine-Precision@3,cosine-Recall@3,cosine-Precision@5,cosine-Recall@5,cosine-Precision@10,cosine-Recall@10,cosine-MRR@10,cosine-NDCG@10,cosine-MAP@100
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
slate,-1,-1,0.406634,0.605651,0.665848,0.748157,0.406634,0.406634,0.201884,0.605651,0.13317,0.665848,0.074816,0.748157,0.51835,0.573785,0.527359
finetuned_slate,-1,-1,0.509828,0.710074,0.781327,0.858722,0.509828,0.509828,0.236691,0.710074,0.156265,0.781327,0.085872,0.858722,0.625368,0.681909,0.630597
