# Hackathon: Optimize and Deploy Text Summarization Model

---

## Problem Statement

- Optimize a transformer-based text summarization model for efficient deployment in a resource-constrained environment to simulate a production scenario.
- Use **knowledge distillation** and **manual pruning** techniques to reduce model size and inference latency without significantly compromising summarization quality.
- Use a subset of the CNN/DailyMail dataset and fine-tune a student model (`t5-small`).
- Apply structured weight pruning to further compress the model.
- The final model is saved in TensorFlow format, suitable for containerized inference deployment.

---

## Steps

1. **Subsampling** you need to subsample the original CNN/DailyMail dataset to adjust it to the small compute resources
2. **Distillation**: Train a compact student model to mimic the output behavior of a larger teacher model using both hard labels and soft logits.
3. **Pruning**: Apply magnitude-based weight pruning manually to remove unimportant weights from the trained student model.
4. **TensorFlow Only**: All components—models, training, pruning, and saving—must use TensorFlow (no PyTorch).
5. **No Quantization**: Skip any quantization techniques in this notebook.
6. **Deployment Ready**: Save the optimized model using TensorFlow’s `SavedModel` format for later use in Docker and ECS environments.
7. **Bonus**: Experiment with subsample size, batch size and number of epochs.

---

## Assumptions

* The dataset is a pre-subsampled version of CNN/DailyMail (\~200 training and \~50 test examples or less) and is available locally in JSON format.
* The selected models (`t5-base` and `t5-small`) are fully supported by Hugging Face's TensorFlow APIs.
* Training is conducted on a CPU-only machine (e.g., AWS `t2.large`), requiring careful management of memory and runtime.
* Only basic training (2 epochs) is required to demonstrate optimization strategies, not to reach state-of-the-art performance.
* Final output will be integrated into a containerized microservice, so model size and inference efficiency are critical.

---

Here's a concise yet informative **description of the CNN/DailyMail dataset** that you can include in your notebook or project documentation:

---

## CNN/DailyMail Dataset Description

The **CNN/DailyMail** dataset is a large-scale benchmark for **abstractive text summarization**, widely used in natural language processing research. It consists of news articles from CNN and the Daily Mail, paired with human-written summaries, often referred to as *highlights*.

Each example in the dataset includes:

* **`article`**: The full body of a news story (approx. 300–800 words).
* **`highlights`**: A concise, human-curated summary capturing the main points (1–3 sentences).

### Dataset Versions

* The most commonly used version is **"3.0.0"**, which filters out anonymized entity tags and provides clean text.
* The dataset is split into:

  * `train`: \~287,000 examples
  * `validation`: \~13,000 examples
  * `test`: \~11,000 examples

### Use Case in This Project

In this project, we use a **subsampled version** of the dataset (e.g., 200 train, 50 test examples) to simulate summarization model training in a resource-constrained environment. The summaries serve as target outputs for fine-tuning a student model using knowledge distillation techniques.

---



In [1]:

# --- Imports ---
import json
import tensorflow as tf
from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer
import numpy as np

# --- Load and prepare data ---
with open("subsampled_cnn_dailymail/train_sample.json", "r", encoding="utf-8") as f:
    train_data = json.load(f)
with open("subsampled_cnn_dailymail/test_sample.json", "r", encoding="utf-8") as f:
    test_data = json.load(f)


In [2]:

# --- Parameters ---
MODEL_NAME_TEACHER = "t5-base"
MODEL_NAME_STUDENT = "t5-small"
MAX_INPUT_LEN = 512
MAX_TARGET_LEN = 64
BATCH_SIZE = 16



In [3]:
# --- Load tokenizer ---
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME_STUDENT)

def preprocess(example):
    inputs = tokenizer(
        example["article"],
        padding="max_length",
        truncation=True,
        max_length=MAX_INPUT_LEN,
        return_tensors="np",
    )
    targets = tokenizer(
        example["summary"],
        padding="max_length",
        truncation=True,
        max_length=MAX_TARGET_LEN,
        return_tensors="np",
    )
    return {
        "input_ids": inputs["input_ids"][0],
        "attention_mask": inputs["attention_mask"][0],
        "labels": targets["input_ids"][0],
    }

train_enc = [preprocess(x) for x in train_data]
test_enc = [preprocess(x) for x in test_data]




In [4]:
# --- Convert to TensorFlow Datasets ---
def convert_to_tf_dataset(encoded_data):
    def gen():
        for ex in encoded_data:
            yield {
                "input_ids": ex["input_ids"],
                "attention_mask": ex["attention_mask"],
                "labels": ex["labels"],
            }

    return tf.data.Dataset.from_generator(
        gen,
        output_signature={
            "input_ids": tf.TensorSpec(shape=(MAX_INPUT_LEN,), dtype=tf.int32),
            "attention_mask": tf.TensorSpec(shape=(MAX_INPUT_LEN,), dtype=tf.int32),
            "labels": tf.TensorSpec(shape=(MAX_TARGET_LEN,), dtype=tf.int32),
        }
    ).batch(BATCH_SIZE)

train_ds = convert_to_tf_dataset(train_enc)
test_ds = convert_to_tf_dataset(test_enc)



In [9]:
# --- Load teacher and student models ---
teacher_model = TFAutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME_TEACHER)
student_model = TFAutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME_STUDENT)

# --- Freeze teacher weights ---
teacher_model.trainable = False

# --- Custom Distillation Loss ---
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

def distillation_loss(student_logits, teacher_logits, labels, alpha=0.5, temperature=2.0):
    hard_loss = loss_object(labels, student_logits)
    soft_loss = tf.keras.losses.KLDivergence()(tf.nn.softmax(teacher_logits / temperature),
                                                tf.nn.softmax(student_logits / temperature))
    return alpha * hard_loss + (1 - alpha) * soft_loss

# --- Training Loop ---
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-5)

@tf.function
def train_step(batch):
    with tf.GradientTape() as tape:
        student_outputs = student_model(
            input_ids=batch["input_ids"],
            attention_mask=batch["attention_mask"],
            labels=batch["labels"],
            training=True
        )
        teacher_outputs = teacher_model(
            input_ids=batch["input_ids"],
            attention_mask=batch["attention_mask"],
            labels=batch["labels"],
            training=False
        )

        loss = distillation_loss(
            student_outputs.logits,
            teacher_outputs.logits,
            batch["labels"]
        )

    gradients = tape.gradient(loss, student_model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, student_model.trainable_variables))
    return loss


All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.
All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


In [10]:
# --- Run Training ---
EPOCHS = 2
for epoch in range(EPOCHS):
    print(f"\nEpoch {epoch+1}/{EPOCHS}")
    for step, batch in enumerate(train_ds):
        loss = train_step(batch)
        if step % 2 == 0:
            print(f"Step {step}: Loss = {loss.numpy():.4f}")





Epoch 1/2
Step 0: Loss = 2.5981
Step 2: Loss = 1.7052
Step 4: Loss = 2.9662
Step 6: Loss = 1.7558

Epoch 2/2
Step 0: Loss = 2.4881
Step 2: Loss = 1.6477
Step 4: Loss = 2.9328
Step 6: Loss = 1.6988


In [7]:
# --- Manual Pruning (Set small weights to zero) ---
def manual_prune(model, threshold=1e-3):
    pruned = 0
    total = 0
    for var in model.trainable_variables:
        mask = tf.math.abs(var) < threshold
        pruned += tf.reduce_sum(tf.cast(mask, tf.int32)).numpy()
        total += tf.size(var).numpy()
        var.assign(tf.where(mask, tf.zeros_like(var), var))
    print(f"Pruned {pruned} of {total} weights (~{100 * pruned / total:.2f}%)")

manual_prune(student_model, threshold=1e-3)

Pruned 145959 of 60506624 weights (~0.24%)


In [8]:
# --- Save as TensorFlow Model ---
student_model.save_pretrained("tf_summarizer_model", saved_model=True)
print("Saved optimized summarization model to 'tf_summarizer_model/'")


INFO:tensorflow:Assets written to: tf_summarizer_model\saved_model\1\assets


INFO:tensorflow:Assets written to: tf_summarizer_model\saved_model\1\assets


Saved optimized summarization model to 'tf_summarizer_model/'
