<a href="https://colab.research.google.com/github/tuhinmallick/AI-for-Fashion/blob/main/Turning_Llama_3_into_a_Text_Embedding_Model_with_LLM2Vec.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook shows how to transform an LLM into a text embedding model using LLM2Vec. For a 7B/8B LLM, it can run on a 24 GB GPU.

We need to install the following packages:

In [None]:
!pip install --upgrade llm2vec transformers
!pip install flash-attn --no-build-isolation

Collecting llm2vec
  Downloading llm2vec-0.1.4-py2.py3-none-any.whl (16 kB)
Collecting transformers
  Downloading transformers-4.40.1-py3-none-any.whl (9.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.0/9.0 MB[0m [31m29.5 MB/s[0m eta [36m0:00:00[0m
Collecting peft (from llm2vec)
  Downloading peft-0.10.0-py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.1/199.1 kB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets (from llm2vec)
  Downloading datasets-2.19.0-py3-none-any.whl (542 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m41.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate (from llm2vec)
  Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets->llm2vec)
  Downloading dill-0.3.8

# Simple Transformation


The following cell transforms Llama 3 8B into an embedding model and serialize it to a directory named "Llama-3-8B-Emb"

In [None]:
import torch
from llm2vec import LLM2Vec

l2v = LLM2Vec.from_pretrained(
    "meta-llama/Meta-Llama-3-8B",
    device_map="cuda" if torch.cuda.is_available() else "cpu",
    torch_dtype=torch.bfloat16,
)

l2v.save("Llama-3-8B-Emb")

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

# MNTP Training

We will use a script from the llm2vec repository.
Clone this repository:

In [None]:
!git clone https://github.com/McGill-NLP/llm2vec.git

Cloning into 'llm2vec'...
remote: Enumerating objects: 516, done.[K
remote: Counting objects: 100% (134/134), done.[K
remote: Compressing objects: 100% (79/79), done.[K
remote: Total 516 (delta 56), reused 88 (delta 39), pack-reused 382[K
Receiving objects: 100% (516/516), 1.30 MiB | 4.03 MiB/s, done.
Resolving deltas: 100% (270/270), done.


We need to create a config file setting the training arguments.
In the following configuration, replace "model_name_or_path" with your own model.

In [None]:
JSON_CONFIG='''
{
    "model_name_or_path": "meta-llama/Meta-Llama-3-8B",
    "dataset_name": "wikitext",
    "dataset_config_name": "wikitext-103-raw-v1",
    "per_device_train_batch_size": 1,
    "per_device_eval_batch_size": 1,
    "gradient_accumulation_steps": 16,
    "do_train": true,
    "do_eval": true,
    "max_seq_length": 512,
    "mask_token_type": "blank",
    "data_collator_type": "all_mask",
    "mlm_probability": 0.8,
    "overwrite_output_dir": true,
    "output_dir": "Llama-3-8B-llm2vec-MNTP-Emb",
    "evaluation_strategy": "steps",
    "eval_steps": 100,
    "save_steps": 200,
    "stop_after_n_steps": 1000,
    "lora_r": 16,
    "gradient_checkpointing": true,
    "torch_dtype": "bfloat16",
    "attn_implementation": "flash_attention_2"
}
'''

with open("mtnp_config.json", 'w') as f:
  f.write(JSON_CONFIG)


The script ignores the "secret" keys registered in Colab. If you saved your Hugging Face API key in Colab, it won't see it. To access Llama 3 8B, we must run notebook_login().

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Then, we run the training:

In [None]:
!python llm2vec/experiments/run_mntp.py mtnp_config.json