# Fine-Tuning GPT-2 on Encrypted Data with LoRA and Concrete ML

In this notebook, we perform fine-tuning of a GPT-2 model using LoRA and Concrete ML.

In [1]:
# Import necessary libraries
import math
import os
import random
import shutil
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import torch
from datasets import Dataset
from peft import LoraConfig, get_peft_model
from tqdm import tqdm
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments

from concrete.ml.torch.hybrid_model import HybridFHEModel

# Set random seed for reproducibility
SEED = 0
torch.manual_seed(SEED)

<torch._C.Generator at 0x779ae136e650>

In [2]:
def generate_and_print(prompt, model, tokenizer, seed=None, max_new_tokens=30):
    """
    Generates text based on the provided prompt and prints both the prompt and the generated text.

    Args:
        prompt (str): The input prompt to generate text from.
        model: The pre-trained language model.
        tokenizer: The tokenizer associated with the model.
        seed (int, optional): Seed for random number generators to ensure reproducibility.
        max_new_tokens (int, optional): Maximum number of tokens to generate. Defaults to 30.
    Returns:
        str: The generated text (response only, without the prompt).
    """
    try:
        # Set the environment variable for CuBLAS deterministic behavior
        os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"

        # Set the random seed for reproducibility
        if seed is not None:
            random.seed(seed)
            np.random.seed(seed)
            torch.manual_seed(seed)
            if torch.cuda.is_available():
                torch.cuda.manual_seed_all(seed)

        # Encode the input prompt
        inputs = tokenizer.encode_plus(prompt, return_tensors="pt")

        # Move inputs to the same device as the model
        inputs = {k: v for k, v in inputs.items()}

        # Generate text
        with torch.no_grad():
            output = model.generate(
                input_ids=inputs["input_ids"],
                attention_mask=inputs["attention_mask"],
                max_new_tokens=max_new_tokens,
                top_p=0.9,
                temperature=0.6,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id,
            )

        # Get only the newly generated tokens
        input_length = inputs["input_ids"].shape[1]
        generated_ids = output[0, input_length:]
        generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True).strip()

        # Print the prompt and generated text
        print(f"Prompt: {prompt}")
        print(f"Response: {generated_text}\n")

        return generated_text

    except Exception as e:
        print(f"Error in generation: {str(e)}")
        return None

In [3]:
# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Ensure tokenizer has a pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = model.config.eos_token_id

# Freeze model weights
for param in model.parameters():
    param.requires_grad = False

In [4]:
_ = generate_and_print(prompt="Programming is", model=model, tokenizer=tokenizer, seed=SEED)

Prompt: Programming is
Response: a skill you need to learn to master.

Learn to code

There are a lot of different ways to learn programming.

The



In [5]:
from torch import nn

try:
    from transformers import Conv1D as TransformerConv1D
except ImportError:  # pragma: no cover
    TransformerConv1D = None

# Create a tuple of linear layer classes to check against
LINEAR_LAYERS: tuple = (nn.Linear,)
if TransformerConv1D is not None:
    LINEAR_LAYERS = LINEAR_LAYERS + (TransformerConv1D,)

remote_names = []
for name, module in model.named_modules():
    # Handle different module types
    if isinstance(module, LINEAR_LAYERS):
        remote_names.append(name)

In [6]:
# Create the HybridFHEModel with the specified remote modules
hybrid_model = HybridFHEModel(model, module_names=remote_names)

In [7]:
BLOCK_SIZE = 32
# Prepare input data for calibration
input_tensor = torch.randint(0, tokenizer.vocab_size, (256, BLOCK_SIZE), dtype=torch.long)

# Calibrate and compile the model
hybrid_model.compile_model(input_tensor, n_bits=8, use_dynamic_quantization=True)

Compiling FHE layers:   0%|          | 0/49 [00:00<?, ?it/s]

Note that our goal is to showcase the use of FHE for encrypted fine-tuning. The dataset consists of 68 examples and a total of 2,386 tokens, which is relatively small. Despite its limited size, which offers little support for the model's learning process, it still manages to produce interesting results.

In [None]:
# Set FHE mode to disable for text generation
hybrid_model.set_fhe_mode("disable")

_ = generate_and_print(
    prompt="Programming is", model=hybrid_model.model, tokenizer=tokenizer, seed=SEED
)