
## Project Description

Modern deep learning hinges on variants of stochastic gradient descent. Give an overview of some of the algorithms (ADAM,
RMSProp, Adagrad,momentum, etc.) and stepsizing schemes (learning rate decay, cosine annealing, superconvergence, hyper-
gradient learning rate adaptation, etc.) and perform an empirical comparison of their performance on some test problems of
your choice. Obviously, no comparison here can be anywhere close to exhaustive due to the overabundance of deep learning op-
timization papers, so just go through a handful of ideas that you find interesting. If you want to do this with realistic (somewhat
large) networks, you’ll probably need access to some GPUs. This will take a bit more work (but should still be doable!) to do
as a project if you don’t have any previous experience in deep learning.




## Notes:
- Test problems:
  - [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)
    - image classification
  - [minGPT](https://github.com/karpathy/minGPT?tab=readme-ov-file)
    - Text Generation (but trained by filling in missing words)
  - [California Housing](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html)
    - Regression (predict housing cost)
  - Simple 1D optimization to visualize things
    - Rosenbrock function




## Text Generation


### Imports

In [1]:
# Reload external files when running cells
%load_ext autoreload
%autoreload 2

# Imports
import torch
import numpy as np
import pathlib
import os
import sys

# Add custom package to import path and import it
file_dir = pathlib.Path().resolve()
pkg_dir = os.path.join(file_dir, "submodules")
sys.path.insert(0, pkg_dir)
from amath515_pkg.src import *

# Load minGTP
from mingpt.model import GPT
from mingpt.trainer import Trainer
from mingpt.utils import CharDataset, CfgNode as CN

# Load config file
pkg_config = helpers.get_config()

# Set seeds
np.random.seed(pkg_config['mingpt_np_seed'])
torch.manual_seed(pkg_config['mingpt_torch_seed'])

# Print matplotlibe plots inline
%matplotlib inline

# Make sure Torch is installed and see if a GPU is available
print("GPU Available?",torch.cuda.is_available())
device = "cpu"
if torch.cuda.is_available():
    device = "cuda"


GPU Available? True


### Load Tiny Shakespeare Dataset

In [2]:

mingpt_config = helpers.get_minGPT_config()
with open(os.path.join(file_dir, 'Datasets', 'tiny-shakespeare.txt'), 'r') as file:
    tiny_shakespeare = file.read()
train_dataset = CharDataset(mingpt_config.data, tiny_shakespeare)


data has 1115394 characters, 65 unique.


### Configure model and trainer

In [3]:

# construct the model
mingpt_config.model.vocab_size = train_dataset.get_vocab_size()
mingpt_config.model.block_size = train_dataset.get_block_size()
model = GPT(mingpt_config.model)

# prepare for training otherwise
# construct the trainer object
mingpt_config.trainer.max_iters = pkg_config['mingpt_iters']
mingpt_config.trainer.optimizer_str = pkg_config['mingpt_SGD_method']
mingpt_config.trainer.scheduler_str = pkg_config['mingpt_Scheduler']
trainer = Trainer(mingpt_config.trainer, model, train_dataset)

# construct callback function
callback = lambda x: training.min_gpt_batch_end_callback(model, mingpt_config, train_dataset, x)
trainer.set_callback('on_batch_end', callback)



number of parameters: 2.71M
running on device cuda



### GPT Training


In [4]:

# run the optimization
trainer.run()


Iteration 0, lr 0.000500
---------------------------------
loss: 3.069319 [iter 10 of 500]
loss: 2.817652 [iter 20 of 500]
loss: 2.711862 [iter 30 of 500]
loss: 2.667384 [iter 40 of 500]
loss: 2.665904 [iter 50 of 500]
loss: 2.624243 [iter 60 of 500]
loss: 2.635427 [iter 70 of 500]
loss: 2.648400 [iter 80 of 500]
loss: 2.633587 [iter 90 of 500]

Iteration 100, lr 0.000452
---------------------------------
loss: 2.616637 [iter 110 of 500]
loss: 2.612584 [iter 120 of 500]
loss: 2.611892 [iter 130 of 500]
loss: 2.590376 [iter 140 of 500]
loss: 2.600641 [iter 150 of 500]
loss: 2.610029 [iter 160 of 500]
loss: 2.602420 [iter 170 of 500]
loss: 2.605690 [iter 180 of 500]
loss: 2.576747 [iter 190 of 500]

Iteration 200, lr 0.000327
---------------------------------
loss: 2.597587 [iter 210 of 500]


KeyboardInterrupt: 

In [None]:

print(trainer.losses)
print(trainer.iter_num)


[4.224329948425293, 3.7430996894836426, 3.5658071041107178, 3.5186662673950195, 3.4524800777435303, 3.3786368370056152, 3.321120500564575, 3.2594423294067383, 3.2242822647094727, 3.1626830101013184, 3.1144871711730957, 3.060476541519165, 3.0568454265594482, 3.0108633041381836, 2.9555728435516357, 2.9399607181549072, 2.918572187423706, 2.9067862033843994, 2.877592086791992, 2.8572659492492676, 2.8313610553741455, 2.809608221054077, 2.792649745941162, 2.7714765071868896, 2.784308671951294, 2.7812156677246094, 2.7605950832366943, 2.7403345108032227, 2.7056925296783447, 2.715184450149536, 2.713942289352417, 2.679215669631958, 2.7113289833068848, 2.7063956260681152, 2.6659536361694336, 2.7189600467681885, 2.7019662857055664, 2.6874468326568604, 2.69785213470459, 2.683434009552002, 2.6667754650115967, 2.669659376144409, 2.6945040225982666, 2.663917064666748, 2.6644721031188965, 2.663381338119507, 2.6398258209228516, 2.6669678688049316, 2.658080816268921, 2.6347594261169434]
50
