## 1. Environment & Hardware Verification
To perform efficient fine-tuning of **ESM-2** using **LoRA** and **RL**, a GPU is required to handle the high-dimensional tensor operations and gradient calculations.

* **Tool:** `nvidia-smi` (NVIDIA System Management Interface)
* **Purpose:** Confirms the presence of a CUDA-enabled device and monitors VRAM availability.
* **Safety Check:** If no GPU is detected, the script terminates execution to prevent slow CPU processing or Out-Of-Memory (OOM) errors.

In [12]:
import subprocess
import sys

print("Checking GPU availability...")
try:
    gpu_info = subprocess.check_output(['nvidia-smi'], text=True)
    print(" GPU detected!")
    print(gpu_info.split('\n')[8])  
except:
    print(" WARNING: No GPU detected! This notebook requires a GPU.")
    print("Go to Runtime > Change runtime type > Select 'T4 GPU'")
    sys.exit(1)

Checking GPU availability...
 GPU detected!
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |


## 2. Dependency Installation
This project utilizes the Hugging Face ecosystem and Meta's FAIR-ESM tools to implement a parameter-efficient training pipeline.

| Library | Primary Function |
| :--- | :--- |
| **transformers** | Provides the pre-trained ESM-2 model architecture and tokenizers. |
| **peft** | Implements **LoRA**, enabling the tuning of a fraction (~1%) of model parameters. |
| **accelerate** | Handles device placement and distributed training optimizations. |
| **fair-esm** | Native Meta AI tools for working with Evolutionary Scale Modeling (ESM) weights. |
| **wandb** | Used for experiment tracking and visualizing multi-objective reward trade-offs. |

In [13]:

!pip install -q transformers>=4.41.0 peft==0.7.1 accelerate==0.25.0
!pip install -q datasets wandb
!pip install -q fair-esm

print(" All packages installed successfully!")

 All packages installed successfully!


In [3]:
import torch 
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, AutoModelForMaskedLM, get_cosine_schedule_with_warmup
from peft import LoraConfig, get_peft_model, TaskType
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns
from dataclasses import dataclass
from typing import List, Dict, Tuple, Optional
import wandb
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')


In [4]:
def set_seed(seed=42):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)

set_seed(42)

In [5]:
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"using device: {device}")
if torch.cuda.is_available():
    print(f"GPU:{torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")


using device: cuda
GPU:Tesla T4
Memory: 15.83 GB


In [6]:
import wandb

wandb.login(key="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")


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmanivarshithpc[0m ([33mmanivarshithpc-vignan-institute-of-technology-and-science[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

## 3. Global Configuration & Experiment Setup
This section defines the architectural and behavioral parameters for the fine-tuning process. We utilize a structured `Config` class to ensure all hyperparameters are tracked.

### Key Components:
* **Model Backbone:** `ESM-2 (650M parameters)` - A large-scale protein language model.
* **LoRA Strategy:** Targets the **Self-Attention** modules (`q, k, v`) to adapt sequence generation with minimal parameter updates.
* **RL Steering:** * **KL Coefficient:** Controls the trade-off between exploring new sequences and staying close to the biologically-valid base model.
    * **Reward Weights:** A weighted sum approach to balance **Stability** (structural integrity), **Diversity** (novelty), and **Constraint Satisfaction**.
* **Logging:** Integrated with **Weights & Biases (WandB)** for real-time monitoring of reward convergence and sequence entropy.

In [14]:
@dataclass
class Config:
    
    model_name: str = "facebook/esm2_t33_650M_UR50D"  
    
    # LoRA configuration
    lora_r: int = 8
    lora_alpha: int = 16
    lora_dropout: float = 0.05
    lora_target_modules: List[str] = None
    
    # Generation configuration
    max_seq_length: int = 64
    min_seq_length: int = 32
    temperature: float = 1.0
    top_k: int = 50
    top_p: float = 0.9
    
    # RL training configuration
    num_epochs: int = 5
    batch_size: int = 4
    gradient_accumulation_steps: int = 4
    num_sequences_per_batch: int = 8
    learning_rate: float = 5e-5
    kl_coef: float = 0.1  # KL penalty coefficient
    clip_range: float = 0.2  
    
    # Reward weights
    stability_weight: float = 1.0
    diversity_weight: float = 0.5
    constraint_weight: float = 0.5
    
    # Optimizer
    adam_epsilon: float = 1e-8
    max_grad_norm: float = 1.0
    warmup_steps: int = 100
    
    
    log_interval: int = 10
    save_interval: int = 100
    use_wandb: bool = True
    
    def __post_init__(self):
        if self.lora_target_modules is None:
            self.lora_target_modules = ["query", "key", "value"]

config = Config()


import gc
torch.cuda.empty_cache()
gc.collect()

print(f"\n Configuration loaded")
print(f"GPU Memory Available: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")


if config.use_wandb:
    wandb.init(
        project="protein-rl-design",
        config=vars(config),
        name="esm2-rl-experiment"
    )


 Configuration loaded
GPU Memory Available: 15.83 GB


In [None]:
tokenizer = AutoTokenizer.from_pretrained(config.model_name)
print(f"Vocabulary size: {len(tokenizer)}")


base_model = AutoModelForMaskedLM.from_pretrained(
    config.model_name,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
)

base_model = base_model.to(device)

print(f"\nBase model parameters: {sum(p.numel() for p in base_model.parameters()) / 1e6:.2f}M")

# Configure LoRA 
lora_config = LoraConfig(
    r=config.lora_r,
    lora_alpha=config.lora_alpha,
    target_modules=config.lora_target_modules,
    lora_dropout=config.lora_dropout,
    bias="none",
    task_type=TaskType.FEATURE_EXTRACTION  
)


model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()


ref_model = AutoModelForMaskedLM.from_pretrained(
    config.model_name,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
)
ref_model = ref_model.to(device)
ref_model.eval()
for param in ref_model.parameters():
    param.requires_grad = False

print("\n Models loaded successfully!")

Vocabulary size: 33

Base model parameters: 651.04M
trainable params: 2,027,520 || all params: 653,070,774 || trainable%: 0.3104594602483314

âœ“ Models loaded successfully!
