# Extract subents from Mamba-supernet with OSF

In this tutorial, we will show you how to quickly extract a subnet from Mamba-1.4B optimized by OSF. The downsized model is more efficient with siginificantly reduced parameters and FLOPs, while maintaining the competitive performance as the original model.

First, import dependency packages, and utility function for calculate model size.

In [None]:
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
)
import torch
from datasets import load_dataset
from ofm import OFM

In [2]:

from torch.nn import Parameter
def calculate_params(model):
    """calculate the number of parameters in the model
    Args:
        model: the model to be evaluated
    Returns:
        total_params: the number of parameters in the model
        percentage: the percentage of trainable parameters in the model
    """

    millions = 1000000
    total_params = 0
    for name, module in model.named_modules():
        if hasattr(module, "weight") and isinstance(module.weight, Parameter):
            total_params += torch.prod(torch.tensor(module.weight.size())).item()

    return total_params / millions


Then, we load Mamba-1.4B supernet optimized by OSF use Huggingface `AutoModelForCausalLM` API.

You can find our published checkpoint at [README.md](README.md)

The Mamba checkpoint we used is [here](https://huggingface.co/yusx-swapp/ofm-mamba-1.4b-lambda-hf)

In [3]:
model = AutoModelForCausalLM.from_pretrained(
    "yusx-swapp/ofm-mamba-1.4b-lambda-hf"
)
tokenizer = AutoTokenizer.from_pretrained("yusx-swapp/ofm-mamba-1.4b-lambda-hf")


Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.59s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Next, we convert the Mamba-1.4B model to a supernet via OSF supernet class `OFM`, with only 1 line of code.





In [4]:
supernet = OFM(model=model)


To extract a subnet from the supernet, we can simply call the `resource_aware_model` API. There are multiple ways to get a subnet, such as specifying the target model structure, get smallest size model with a elastic space, or get a random downsized model. More details can be found in the `examples/post_training_deployment`.

In this example, we extract the smallest model within tge elastic space with API `OFM.smallest_model()`.

In [5]:
ds_model, param, arc_config = supernet.smallest_model()


Let's compare the model size between the original model and the subnet.

In [6]:
original_model_params = calculate_params(model)
ds_model_params = calculate_params(ds_model)
print(f"Original model has {original_model_params}M parameters")
print(f"Downsized model has {ds_model_params}M parameters")
print(f"Total model size reduction: {original_model_params - ds_model_params}M")

Original model has 1471.41632M parameters
Downsized model has 601.475072M parameters
Total model size reduction: 869.9412480000001M


Now, let's evaluate the subnet's performance on the Lambada dataset via the metric of **perplexity**.

First, we load the Lambada dataset and preprocess it.



In [7]:
dataset = load_dataset("lambada")


def tokenize_function(examples):
    return tokenizer(
        examples["text"], padding="max_length", truncation=True, max_length=1024
    )
tokenized_datasets = dataset.map(tokenize_function, batched=True)

data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)



Downloading readme: 100%|██████████| 7.32k/7.32k [00:00<00:00, 31.2MB/s]
Map: 100%|██████████| 5153/5153 [00:03<00:00, 1568.99 examples/s]


Then, we use huggingface `Trainer.evaluate()` API to evaluate the subnet on the Lambada dataset.


In [8]:
training_args = TrainingArguments(
    output_dir="./eval",
    num_train_epochs=20,
    per_device_train_batch_size=16,
    logging_dir="./logs",
    logging_steps=10,
    warmup_steps=100,
    weight_decay=0.01,
    learning_rate=2e-5,
    fp16=True,
    eval_steps=100,
    save_total_limit=1,
)

trainer = Trainer(
    model=ds_model,
    tokenizer=tokenizer,
    args=training_args,
    data_collator=data_collator,
    train_dataset=None,
    eval_dataset=tokenized_datasets["test"],
)

eval_metrics = trainer.evaluate()


Finally, we compare calculate perplexity of the subnet with the original model.
The details of how to calculate perplexity and what is the perplexity can be found in the [link](https://huggingface.co/docs/transformers/en/perplexity).

In [9]:

loss = eval_metrics["eval_loss"]

perplexity = torch.exp(torch.tensor(loss))

print(f"Perplexity: {perplexity}")

Perplexity: 4.931185245513916
