# Continual Learning Modeling Tasks - Reducing Catastrophic Forgetting

## Introduction
Overview and Context

Continual learning in LLMs aims to enable these models to learn new tasks and adapt to new data without forgetting previously learned information. This project addresses the challenge of catastrophic forgetting by enhancing GPT models with continual learning capabilities. This advancement has significant potential applications in automated customer service and dynamic content creation.

Goal

To explore methods for enabling LLMs to continually learn and adapt to new data or tasks without forgetting previously learned information, thereby addressing catastrophic forgetting.

Objectives
1. Mitigate Catastrophic Forgetting: Implement and test Elastic Weight Consolidation (EWC) on GPT-2.
2. Adapt GPT for Continual Learning: Integrate continual learning mechanisms within the Transformer architecture.
3. Evaluate Model Performance: Use backward and forward transfer metrics to measure performance on old vs. new tasks.
4. Understand Transformer Architecture: Explore the self-attention mechanisms and their scalability in transformers.

## Dataset Description
Dataset: WikiText-103
• Description: A collection of over 100 million tokens from verified Good and Featured articles on Wikipedia.
• Usage: To test the model’s ability to learn continually and adapt over time.


## Step 1: Elastic Weight Consolidation (EWC)

- **Model Setup**: Load GPT-2 model.

In [1]:
!pip install transformers
!pip install torch



In [3]:
import torch
print(torch.__version__)

In [5]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

model_name = 'gpt2'
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs.huggingface.co/gpt2/248dfc3911869ec493c76e65bf2fcf7f615828b0254c12b473182f0f81d3a707?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model.safetensors%3B+filename%3D%22model.safetensors%22%3B&Expires=1718267162&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxODI2NzE2Mn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9ncHQyLzI0OGRmYzM5MTE4NjllYzQ5M2M3NmU2NWJmMmZjZjdmNjE1ODI4YjAyNTRjMTJiNDczMTgyZjBmODFkM2E3MDc%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qIn1dfQ__&Signature=sOypo9iWyahNc0J5SlMbTRBxB3VSZmKXRXNMix9FGCz0baozdqB2gohw27lpkd-CDg6-BnyUbAp51YtysUsseS2H8TMqMgE-McNjZnmd2C525on1cNGaoPZDIqINMQFv1f5wcsTDcbuhV1qycCbkg2JU-BdWT8Guft8x39CLdfrLhGLa-2EXiAgArC1kcp4RHSEFLhL6UVqc0t2hzgKNZVtsADP9JY6ewl0oLFDBRYq0DudzTO9cyj2bfMqJeW-nz80Yp7No3lj3COP1W21%7Et2a1MjLC2rF2LG6VnVp-nlUb8NbK22kB9FiHUfhtA7-GpPsIyWWhPI5BqCSh1GIPDg__&Key-Pair-Id=KVTP0A1DKRTAX: HTTPSConnectionPool(host='cdn-lfs.huggin

model.safetensors:  15%|#5        | 83.9M/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

- **Fisher Information Matrix Calculation**

- **EWC Regularization**

- **Training**: Implement training with EWC.

## Step 2: Progressive Prompts
- **Model Setup**: Sequential training process.
- **Training**: Code for fine-tuning with prompts.

## Step 3: Low-Rank Adaptation (LoRA)
- **Model Setup**: Apply LoRA to GPT-2.
- **Training**: Code for fine-tuning with LoRA.

## Evaluation
- **Metrics**: Define and calculate backward transfer, forward transfer, perplexity, and other metrics.
- **Results**: Present results for each method and compare.