# Continual Learning Modeling Tasks - Reducing Catastrophic Forgetting

## Introduction
Overview and Context

Continual learning in LLMs aims to enable these models to learn new tasks and adapt to new data without forgetting previously learned information. This project addresses the challenge of catastrophic forgetting by enhancing GPT models with continual learning capabilities. This advancement has significant potential applications in automated customer service and dynamic content creation.

Goal

To explore methods for enabling LLMs to continually learn and adapt to new data or tasks without forgetting previously learned information, thereby addressing catastrophic forgetting.

Objectives
1. Mitigate Catastrophic Forgetting: Implement and test Elastic Weight Consolidation (EWC) on GPT-2.
2. Adapt GPT for Continual Learning: Integrate continual learning mechanisms within the Transformer architecture.
3. Evaluate Model Performance: Use backward and forward transfer metrics to measure performance on old vs. new tasks.
4. Understand Transformer Architecture: Explore the self-attention mechanisms and their scalability in transformers.

## Dataset Description
Dataset: WikiText-103
• Description: A collection of over 100 million tokens from verified Good and Featured articles on Wikipedia.
• Usage: To test the model’s ability to learn continually and adapt over time.


## Step 1: Elastic Weight Consolidation (EWC)

- **Model Setup**: Load GPT-2 model.

In [1]:
!pip install transformers
!pip install torch



In [2]:
import torch
print(torch.__version__)

2.3.1


In [3]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

model_name = 'gpt2'
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

#### Text generation(just for fun!)

In [17]:
# Text generation
input_text = "I'm a student in EPITA,"

# Set the pad token to be the eos token
tokenizer.pad_token = tokenizer.eos_token

# Tokenize the input text
inputs = tokenizer(input_text, return_tensors='pt', padding=True, truncation=True)

# Generate the text
outputs = model.generate(inputs['input_ids'], attention_mask=inputs['attention_mask'], max_length=100, pad_token_id=tokenizer.eos_token_id, temperature=0.7, repetition_penalty=2.0)

# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generated_text)

I'm a student in EPITA, and I've been working on this project for about two years now. It's really exciting to be able do something like that."
"It was very difficult because we had no idea what the future would hold," he continued of his work with The Blacklist: "We were just trying not only get it right but also make sure there are enough people who can help us out as well so if you're interested then go ahead!"


- **Fisher Information Matrix Calculation**

- **EWC Regularization**

- **Training**: Implement training with EWC.

## Step 2: Progressive Prompts
- **Model Setup**: Sequential training process.
- **Training**: Code for fine-tuning with prompts.

## Step 3: Low-Rank Adaptation (LoRA)
- **Model Setup**: Apply LoRA to GPT-2.
- **Training**: Code for fine-tuning with LoRA.

## Evaluation
- **Metrics**: Define and calculate backward transfer, forward transfer, perplexity, and other metrics.
- **Results**: Present results for each method and compare.