# Fine Tuning Embedding Models for Retrieval on Domain Specific Data

<img src="https://miro.medium.com/v2/resize:fit:1400/0*AjX-xfa4UvNVu9js.jpg" width=600>

Embedding models are the backbone of modern Retrieval Augmented Generation pipelines, supplying a language model with the most similar and relevant context from a knowledgebase to aide it's generation. These are commonly used for querying over and finding insights among large quantities of unstructured data.

More often than not, we default to standard and generalized embedding models to convert our data into dense vector representations, which are then stored in a vector database and retrieved at runtime. And while these models are quite powerful to start, they suffer in performance when applied to domain specific or niche content- often failing to retrieve the most relevant or useful documents from an end user perspective. This error compounds as it is passed to a language model, which will confidently answer with erroneous data.

To address this, it's possible to fine tune open source embedding models on your own knowledgebase data to boost retrieved document, with minimal data prep using [Sentence Transformers](https://sbert.net/). In this notebook we'll walk through how I was able to boost my embedding model performance upwards of 60+% across standard information retrieval metrics for unseen queries through:

1. Preparing a synthetic dataset of positive question + chunk pairs
2. Manipulating and preparing the dataset for training and evaluators
3. Evaluating the base performance of the embedding model
4. Fine tuning the embedding model on our data with Matryoshka Representation Learning
5. Publishing the fine tuned model to Hugging Face
6. Evaluating the performance of our fine-tuned model

The resulting model has been published here: [AdamLucek/ModernBERT-embed-base-legal-MRL](https://huggingface.co/AdamLucek/ModernBERT-embed-base-legal-MRL)  
Along with the dataset: [AdamLucek/legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic)

This notebook is inspired by and pulls methodology and code snippets from Philipp Schmid's blog post: [*Fine-tune Embedding models for Retrieval Augmented Generation (RAG)*](https://www.philschmid.de/fine-tune-embedding-model-for-rag).

---
##Install Dependencies

In [None]:
%%capture
!pip install --upgrade sentence-transformers datasets transformers torch tensorboard

In [None]:
import torch

from sentence_transformers import SentenceTransformer, SentenceTransformerModelCardData, SentenceTransformerTrainingArguments, SentenceTransformerTrainer
from sentence_transformers.evaluation import InformationRetrievalEvaluator, SequentialEvaluator
from sentence_transformers.util import cos_sim
from sentence_transformers.losses import MatryoshkaLoss, MultipleNegativesRankingLoss
from sentence_transformers.training_args import BatchSamplers

from datasets import load_dataset, concatenate_datasets

**Login to Hugging Face**

Used for pushing model to the Hugging Face Hub and downloading gated models or datasets

In [None]:
from huggingface_hub import login
from google.colab import userdata

login(token=userdata.get('HF_TOKEN'), add_to_git_credential=True)

---
## Dataset Preperation

Embedding model training requires a few unique dataset structures. A more detailed overview can be sound in the [sentence transformers dataset documentation](https://sbert.net/docs/sentence_transformer/dataset_overview.html), but the 4 most popular formats are:

1. **Positive Pair**: A pair of related sentences.
2. **Triplets**: An anchor, positive, and negative
3. **Pair with Similarity Score**: A pair of sentences with a score indicating their similarity
4. **Texts with Classes**: A text with its corresponding class

The structure of these datasets are what aids in the embedding model training, generally optimizing towards being able to accurately represent similar texts together.

While not explicitly stated except for the case of triplets, the use of a *negative*, or a dissimilar text is generally contrasted with the positive and anchor pair to further learn differences. The inclusion of very similar yet different negative examples can push performance even more. Most training implementations that don't include negatives will sample other entries for pseudo-triplets.

We'll be using the [AdamLucek/legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic) dataset, a collection of ~6,500 synthetic positive pairs across a knowledgebase of 10 legal documents, intended to simulate question and expected retrieved chunk in a RAG pipeline.

In [None]:
# Load dataset from the hub
dataset = load_dataset("AdamLucek/legal-rag-positives-synthetic", split="train")

We then need to format the dataset into a structure expected in the upcoming training: `[anchor, positive, id]`. We remove the extraneous columns, rename our `question` and `text` columns, and add in a simple `id` column.

Note we keep `global_chunk_id` to assist in mapping multiple questions to the same chunk for evaluating performance.

In [None]:
# Clean & Format Columns
dataset = dataset.rename_column("question", "anchor")
dataset = dataset.rename_column("text", "positive")
dataset = dataset.remove_columns(["chunk_id", "case_name", "date_filed", "court", "question_id", "answer_location"]) # keep global_chunk_id

# Add an id column to the dataset
dataset = dataset.add_column("id", range(len(dataset)))

Once formatted, we shuffle the entries and split into a 90/10 train/test split. These are saved briefly onto our disk for easier loading.

In [None]:
# Shuffle Dataset
dataset = dataset.shuffle()

# Split Dataset Into a 90/10 Train/Test split
dataset = dataset.train_test_split(test_size=0.1)

# Save Datasets to Disk
dataset["train"].to_json("train_dataset.json", orient="records")
dataset["test"].to_json("test_dataset.json", orient="records")

---
## Base Model Evaluation & Matryoshka Dimensions

Now that we have our dataset prepped, ready, and saved it's time to choose a candidate model. For this example we will be using [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base), an embedding model trained from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base).

modernbert-embed-base takes an input sentence and returns a 768 dimension representation vector. Without going into too much detail, these semantically rich numerical representations are what's used then for calculating similarity between text and what powers then vector database search.

In [None]:
# Hugging Face model ID
model_id = "nomic-ai/modernbert-embed-base"

# Loading via SentenceTransformer
model = SentenceTransformer(
    model_id, device="cuda" if torch.cuda.is_available() else "cpu"
)

To run our base evaluations, we need to prepare the data slightly differently for the [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator). This evaluator requires three key data structures:

1. A corpus dictionary mapping IDs to documents (`{corpus_id: text_chunk}`)
2. A queries dictionary mapping IDs to questions (`{query_id: question_text}`)
3. A relevant_docs dictionary specifying which corpus documents are relevant for each query (`{query_id: [relevant_corpus_ids]}`)

To build these structures:
- We combine train and test datasets into a single corpus_dataset to ensure all text chunks are available during evaluation
- The corpus dictionary is created from the combined corpus_dataset, containing all text chunks
- The queries dictionary is created only from the test_dataset, as we want to evaluate on unseen questions
- For the relevance mapping, we use global_chunk_id as the connecting key to identify which corpus documents contain the text chunk relevant to each test query

In [None]:
# Load train and test datasets from their respective JSON files
# These contain pairs of questions (anchors) and text chunks (positives)
test_dataset = load_dataset("json", data_files="test_dataset.json", split="train")
train_dataset = load_dataset("json", data_files="train_dataset.json", split="train")

# Combine train and test datasets into a single corpus
# This ensures we have all possible text chunks available for retrieval evaluation
corpus_dataset = concatenate_datasets([train_dataset, test_dataset])

# Convert datasets into dictionary format required by the InformationRetrievalEvaluator
# corpus: maps corpus IDs to their text chunks (documents)
# Format: {corpus_id: text_chunk}
corpus = dict(
    zip(corpus_dataset["id"], corpus_dataset["positive"])
)

# queries: maps query IDs to their questions
# Format: {query_id: question_text}
queries = dict(
    zip(test_dataset["id"], test_dataset["anchor"])
)

# Create a mapping between queries and their relevant documents
# This tells the evaluator which documents are correct matches for each query
relevant_docs = {}
for q_id, global_chunk_id in zip(test_dataset["id"], test_dataset["global_chunk_id"]):
    # Initialize empty list for each query if not already present
    if q_id not in relevant_docs:
        relevant_docs[q_id] = []

    # Find all corpus entries that share the same global_chunk_id
    # This handles cases where multiple questions can refer to the same text chunk
    matching_corpus_ids = [
        cid for cid, chunk in zip(corpus_dataset["id"], corpus_dataset["global_chunk_id"])
        if chunk == global_chunk_id
    ]
    # Add the matching corpus IDs to the relevant documents for this query
    relevant_docs[q_id].extend(matching_corpus_ids)

While we can use and train the base model as such, an interesting approach that's gained popularity is applying [matryoshka embedding](https://huggingface.co/blog/matryoshka) \([paper](https://arxiv.org/pdf/2205.13147)\) techniques.  

<img src="https://media.licdn.com/dms/image/v2/D4D10AQF1xd1EHCcsxQ/image-shrink_1280/image-shrink_1280/0/1732933803121?e=1737990000&v=beta&t=WSuF0sEVMUpaCE34-f9EcH6NapA-519yMqujs8P9ygo" width=600>

Matryoshka Representation Learning (MRL) is a technique for training models to encode information at different granularities within the same embedding vector, with coarser/higher-level information packed into earlier dimensions and finer details in later dimensions. Named after Russian nesting dolls, this approach allows for flexible truncation of the embedding to different sizes while maintaining comparable accuracy to independently trained models of those smaller sizes, enabling adaptive compute-vs-accuracy trade-offs during deployment.

<img src="https://weaviate.io/assets/images/hero-237ed4b707a303e4ad3353daaf4edab8.jpeg" width=400>

The main benefits of this technique are efficiency gains in vector storage and retrieval operations, as lower dimensional vectors (e.g., 64 vs 768 dimensions) require less memory and enable faster distance computations. The coarse-to-fine nature of Matryoshka representations enables particularly efficient multi-stage retrieval pipelines - for example, using low-dimensional representations (e.g., 16d) for fast initial candidate shortlisting, followed by re-ranking those candidates with higher-dimensional representations (e.g., 2048d) for better accuracy. This can achieve comparable accuracy to full-dimensional search while being up to 128x more computationally efficient.



In [None]:
# Dimensions of interest
matryoshka_dimensions = [768, 512, 256, 128, 64] # Important: large to small

# Create empty list to hold evaluators
matryoshka_evaluators = []

# Create an evaluator for each above dimension
for dim in matryoshka_dimensions:
    # Define the evaluator
    ir_evaluator = InformationRetrievalEvaluator(
        queries=queries,
        corpus=corpus,
        relevant_docs=relevant_docs,
        name=f"dim_{dim}",
        truncate_dim=dim,  # Truncate the embeddings to the respective dimension
        score_functions={"cosine": cos_sim},
    )
    # Add to list
    matryoshka_evaluators.append(ir_evaluator)

# Create a sequential evaluator
# Able to run all our dimension specific InformationRetrievalEvaluators sequentially.
evaluator = SequentialEvaluator(matryoshka_evaluators)

Information retrieval systems require rigorous evaluation across multiple dimensions of performance. For our embedding model evaluation, we focus on five complementary metrics that together provide a comprehensive view of retrieval quality.

---
**Accuracy@k**

Our most fundamental metric, measuring the presence of at least one relevant document in the top-k results for each query. While simple, it provides essential validation of basic retrieval capability. An Accuracy@k of 0.90 indicates that 90% of queries successfully retrieved at least one relevant document within their top k results.

Technical implementation:
```
Accuracy@k = (queries with ≥1 relevant doc in top k) / (total queries)
```

---
**NDCG@k (Normalized Discounted Cumulative Gain)**

NDCG captures both the presence and positioning of relevant documents in ranked results. The key insight is that relevant documents appearing lower in the ranking contribute diminishing value to the overall score. The normalization against an ideal ranking produces a score between 0 and 1, enabling comparison across different queries and result sets.

Technical implementation:
```
DCG@k = Σ(i=1 to k) rel_i / log2(i + 2)
NDCG@k = DCG@k / IDCG@k
```

Where rel_i represents the relevance (0 or 1) of the document at position i, and IDCG represents the DCG of a perfect ranking.

---
**Precision@k and Recall@k**

These complementary metrics evaluate retrieval effectiveness from different perspectives:

Precision@k measures result set accuracy by calculating the fraction of relevant documents among the top k results. A Precision@5 of 0.8 indicates that 4 of the top 5 results were relevant.

Recall@k quantifies retrieval completeness by measuring the fraction of all relevant documents found within the top k results. A Recall@10 of 0.7 indicates that 70% of all relevant documents appear in the top 10 results.

Technical implementation:
```
Precision@k = (relevant docs in top k) / k
Recall@k = (relevant docs in top k) / (total relevant docs)
```

---
**Mean Reciprocal Rank (MRR@k)**

MRR focuses specifically on the position of the first relevant document in the ranking. The reciprocal rank for a query is 1/position of the first relevant result, with the final metric averaged across all queries. This is particularly valuable for evaluating systems where the position of the first relevant result is critical.

Technical implementation:
```
MRR = (1/|Q|) Σ(i=1 to |Q|) 1/rank_i
```

---
**Mean Average Precision (MAP@k)**

MAP provides a comprehensive single-score assessment of ranking quality. It incorporates both the precision at each relevant document position and the total recall, making it particularly effective for evaluating overall retrieval performance.

Technical implementation:
```
AP@k = (1/min(k, R)) Σ(r=1 to k) (P@r * rel(r))
MAP@k = (1/|Q|) Σ(q=1 to |Q|) AP@k(q)
```

Where:
- R represents total relevant documents
- P@r is precision at rank r
- rel(r) is 1 for relevant results, 0 otherwise
- |Q| represents the total number of queries

---

These metrics are evaluated at multiple k values (typically k=1,3,5,10 for most metrics, k=100 for MAP) to assess performance across different result depths. Together, they provide a comprehensive framework for evaluating retrieval systems across multiple dimensions: basic retrieval capability (Accuracy), ranking quality (NDCG), result set precision and completeness (Precision/Recall), first-relevant-result positioning (MRR), and overall ranking effectiveness (MAP).

In [None]:
# Evaluate the model
base_results = evaluator(model)

# Print header
print("\nBase Model Evaluation Results")
print("-" * 85)
print(f"{'Metric':15} {'768d':>12} {'512d':>12} {'256d':>12} {'128d':>12} {'64d':>12}")
print("-" * 85)

# List of metrics to display
metrics = [
    'ndcg@10',
    'mrr@10',
    'map@100',
    'accuracy@1',
    'accuracy@3',
    'accuracy@5',
    'accuracy@10',
    'precision@1',
    'precision@3',
    'precision@5',
    'precision@10',
    'recall@1',
    'recall@3',
    'recall@5',
    'recall@10'
]

# Print each metric
for metric in metrics:
    values = []
    for dim in matryoshka_dimensions:
        key = f"dim_{dim}_cosine_{metric}"
        values.append(base_results[key])

    # Highlight NDCG@10
    metric_name = f"=={metric}==" if metric == "ndcg@10" else metric
    print(f"{metric_name:15}", end="  ")
    for val in values:
        print(f"{val:12.4f}", end=" ")
    print()

# Print sequential score
print("-" * 85)
print(f"{'seq_score:'} {base_results['sequential_score']:1f}")

For our matryoshka embedding evaluation, we track these metrics across multiple embedding dimensions: 768, 512, 256, 128, and 64. This helps us understand how retrieval quality changes as we reduce the embedding size.

---
## Training

Now with our training data prepared, our evaluation methodology ready, and our base model loaded with baseline metrics- it's time to train!

We'll continue using Sentence Transformers [fine-tuning](https://sbert.net/docs/sentence_transformer/training_overview.html) tools, see linked documentation for further details.

Starting, let's load our base model with a few additional arguments, namely using Scaled Dot Product Attention (SDPA) for GPU efficiencies, and a base model card for uploading to Hugging Face.

In [None]:
# load model with SDPA for using Flash Attention 2
model = SentenceTransformer(
    model_id,
    model_kwargs={"attn_implementation": "sdpa"},
    model_card_data=SentenceTransformerModelCardData(
        language="en",
        license="apache-2.0",
        model_name="ModernBERT Embed base Legal Matryoshka",
    ),
)

Next is defining our loss function. Loss functions are what's used to guide the model towards improvements at train time, generally comparing current performance with expected performance, calculating the difference, and then the value determines the direction we optimize towards.

Sentence Transformers offers [many different loss functions](https://sbert.net/docs/sentence_transformer/loss_overview.html) for various scenarios. Given our commitment to MRL training, we will need not only a base loss function, but an additional adapter.

Given our data structure of positive pairs, we utilize [`MultipleNegativesRankingLoss`](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) which optimizes for retrieval scenarios by treating each batch as (a₁, p₁)...(aₙ, pₙ) pairs where (aᵢ, pᵢ) are positive pairs and (aᵢ, pⱼ) for i≠j become negative pairs. This effectively samples n-1 negative examples per positive pair within each batch, with performance scaling with batch size.

We wrap this with [`MatryoshkaLoss`](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) to enable multi-dimensional embedding training, allowing for dynamic dimensionality reduction at inference time without requiring retraining.

In [None]:
# Initial Loss
base_loss = MultipleNegativesRankingLoss(model)

# Matryoshka Loss Wrapper
train_loss = MatryoshkaLoss(
    model, base_loss, matryoshka_dims=matryoshka_dimensions
)

Below are the defined training hyperparameters. These are taken directly from the aforementioned [Philipp Schmid's original blogpost](https://www.philschmid.de/fine-tune-embedding-model-for-rag#4-fine-tune-embedding-model-with-sentencetransformerstrainer). It is worth testing various combinations of hyperparameters for optimal performance, but for the sake of this demonstration we will default to Philipp's provided arguments.

In [None]:
# Training Arguments
args = SentenceTransformerTrainingArguments(
    output_dir="modernbert-embed-base-legal-matryoshka-lucek", # output directory and hugging face model ID
    num_train_epochs=4,                                        # number of epochs
    per_device_train_batch_size=32,                            # train batch size
    gradient_accumulation_steps=16,                            # for a global batch size of 512
    per_device_eval_batch_size=16,                             # evaluation batch size
    warmup_ratio=0.1,                                          # warmup ratio
    learning_rate=2e-5,                                        # learning rate, 2e-5 is a good value
    lr_scheduler_type="cosine",                                # use cosine learning rate scheduler
    optim="adamw_torch_fused",                                 # use fused adamw optimizer
    tf32=True,                                                 # use tf32 precision
    bf16=True,                                                 # use bf16 precision
    batch_sampler=BatchSamplers.NO_DUPLICATES,                 # MultipleNegativesRankingLoss benefits from no duplicate samples in a batch
    eval_strategy="epoch",                                     # evaluate after each epoch
    save_strategy="epoch",                                     # save after each epoch
    logging_steps=10,                                          # log every 10 steps
    save_total_limit=3,                                        # save only the last 3 models
    load_best_model_at_end=True,                               # load the best model when training ends
    metric_for_best_model="eval_dim_128_cosine_ndcg@10",       # Optimizing for the best ndcg@10 score for the 128 dimension
    report_to="none"                                           # Turning off training logging for now, input 'wandb' etc. if desired.
)

Finally, package our model, training arguments, dataset, loss function and evaluator together into a `SentenceTransformerTrainer`

In [None]:
trainer = SentenceTransformerTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset.select_columns(
        ["positive", "anchor"]
    ),  # training dataset
    loss=train_loss,
    evaluator=evaluator,
)

Start the training run!

In [None]:
# Start training
trainer.train()

# Save the best model based on our eval_dim_128_cosine_ndcg@10 criteria
trainer.save_model()

Optionally save the model to Hugging Face

You can find my chosen run uploaded with more information here: [AdamLucek/ModernBERT-embed-base-legal-MRL](https://huggingface.co/AdamLucek/ModernBERT-embed-base-legal-MRL)

In [None]:
# Upload model to hub
trainer.model.push_to_hub("modernbert-embed-base-legal-matryoshka-2")

---
## Evaluating Trained Model

In [None]:
fine_tuned_model = SentenceTransformer(
    args.output_dir, device="cuda" if torch.cuda.is_available() else "cpu"
)

# Evaluate the model
ft_results = evaluator(fine_tuned_model)

# Print header
print("Fine Tuned Model Evaluation Results")
print("-" * 85)
print(f"{'Metric':15} {'768d':>12} {'512d':>12} {'256d':>12} {'128d':>12} {'64d':>12}")
print("-" * 85)

# List of metrics to display
metrics = [
    'ndcg@10',
    'mrr@10',
    'map@100',
    'accuracy@1',
    'accuracy@3',
    'accuracy@5',
    'accuracy@10',
    'precision@1',
    'precision@3',
    'precision@5',
    'precision@10',
    'recall@1',
    'recall@3',
    'recall@5',
    'recall@10'
]

# Print each metric
for metric in metrics:
    values = []
    for dim in matryoshka_dimensions:
        key = f"dim_{dim}_cosine_{metric}"
        values.append(ft_results[key])

    # Highlight NDCG@10
    metric_name = f"=={metric}==" if metric == "ndcg@10" else metric
    print(f"{metric_name:15}", end="  ")
    for val in values:
        print(f"{val:12.4f}", end=" ")
    print()

# Print sequential score
print("-" * 85)
print(f"{'seq_score:'} {ft_results['sequential_score']:1f}")

---
## Base vs FT Comparison

| Metric | Dimension | Base | Fine-tuned | Abs. Improvement | % Improvement |
|---------|-----------|------|------------|-----------------|---------------|
| ndcg@10 | 768d | 0.4435 | 0.6584 | 0.2149 | 48.5% |
| ndcg@10 | 512d | 0.4308 | 0.6536 | 0.2228 | 51.7% |
| ndcg@10 | 256d | 0.4014 | 0.6244 | 0.2230 | 55.6% |
| ndcg@10 | 128d | 0.3571 | 0.5504 | 0.1933 | 54.1% |
| ndcg@10 | 64d | 0.2682 | 0.4275 | 0.1593 | 59.4% |
| mrr@10 | 768d | 0.3927 | 0.5998 | 0.2071 | 52.7% |
| mrr@10 | 512d | 0.3748 | 0.5939 | 0.2191 | 58.5% |
| mrr@10 | 256d | 0.3478 | 0.5682 | 0.2204 | 63.4% |
| mrr@10 | 128d | 0.3128 | 0.4878 | 0.1750 | 55.9% |
| mrr@10 | 64d | 0.2276 | 0.3813 | 0.1537 | 67.5% |
| map@100 | 768d | 0.4365 | 0.6401 | 0.2036 | 46.6% |
| map@100 | 512d | 0.4204 | 0.6345 | 0.2141 | 50.9% |
| map@100 | 256d | 0.3936 | 0.6093 | 0.2157 | 54.8% |
| map@100 | 128d | 0.3540 | 0.5334 | 0.1794 | 50.7% |
| map@100 | 64d | 0.2640 | 0.4235 | 0.1595 | 60.4% |

Some impressive results given our fine tuning! Generalizing effectively to unseen queries across the existing knowledgebase.

Further testing would have to be ran to understand how well this model may generalize to unseen documents outside of the knowledgebase.

---
## Using the Model

The resulting model has been published here: [AdamLucek/ModernBERT-embed-base-legal-MRL](https://huggingface.co/AdamLucek/ModernBERT-embed-base-legal-MRL)  
Along with the dataset: [AdamLucek/legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic)

The model can now be loaded and used like any other sentence transformers model:

In [None]:
%%capture
!pip install --upgrade sentence-transformers
!pip install git+https://github.com/huggingface/transformers

In [None]:
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("AdamLucek/ModernBERT-embed-base-legal-MRL", truncate_dim=256)

In [None]:
# Run inference
sentences = [
    'Which organization is Carmody Gaba Daman associated with?',
    'Assistant General Counsel, U.S. General Services Administration, Washington, D.C.; Carmody Gaba Daman, Assistant General Counsel, U.S. General Services Administration, Washington, D.C.; Michael Blumenthal, Trial Attorney, U.S. Small Business Administration, Office of General Counsel, Washington, D.C. MEMORANDUM AND ORDER', # Corresponding Positive
    'certain Solicitation requirements violate federal procurement statutes and agency regulations governing procurements involving small business offerors. See generally SHS MJAR at 14; VCH MJAR at 14. Having considered the parties’ arguments, applicable law, and the Administrative Record, this Court GRANTS in part and DENIES in part Plaintiffs’ Motions for Judgment on the', # Random Excerpt
]

embeddings = model.encode(sentences)
print(embeddings.shape)

In [None]:
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities[0])

For comparison, output from our base model nomic-ai/modernbert-embed-base: `tensor([1.0000, 0.6490, 0.4759])`