
## Project Description

Modern deep learning hinges on variants of stochastic gradient descent. Give an overview of some of the algorithms (ADAM,
RMSProp, Adagrad,momentum, etc.) and stepsizing schemes (learning rate decay, cosine annealing, superconvergence, hyper-
gradient learning rate adaptation, etc.) and perform an empirical comparison of their performance on some test problems of
your choice. Obviously, no comparison here can be anywhere close to exhaustive due to the overabundance of deep learning op-
timization papers, so just go through a handful of ideas that you find interesting. If you want to do this with realistic (somewhat
large) networks, you’ll probably need access to some GPUs. This will take a bit more work (but should still be doable!) to do
as a project if you don’t have any previous experience in deep learning.




## Notes:
- Test problems:
  - [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)
    - image classification
  - [minGPT](https://github.com/karpathy/minGPT?tab=readme-ov-file)
    - Text Generation (but trained by filling in missing words)
  - [California Housing](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html)
    - Regression (predict housing cost)
  - Simple 1D optimization to visualize things
    - Rosenbrock function




## Text Generation


### Imports

In [1]:
# Reload external files when running cells
%load_ext autoreload
%autoreload 2

# Imports
import torch
import numpy as np
import pathlib
import os
import sys

# Add custom package to import path and import it
file_dir = pathlib.Path().resolve()
pkg_dir = os.path.join(file_dir, "submodules")
print(pkg_dir)
sys.path.insert(0, pkg_dir)
sys.path.insert(0, os.path.join(pkg_dir, "minGPT"))
sys.path.insert(0, os.path.join(pkg_dir, "amath515_pkg"))
from amath515_pkg.src import *

# Load minGTP
from mingpt.model import GPT
from mingpt.trainer import Trainer
from mingpt.utils import CharDataset, CfgNode as CN

# Load config file
pkg_config = helpers.get_config()

# Set seeds
np.random.seed(pkg_config['mingpt_np_seed'])
torch.manual_seed(pkg_config['mingpt_torch_seed'])

# Print matplotlibe plots inline
%matplotlib inline

# Make sure Torch is installed and see if a GPU is available
print("GPU Available?",torch.cuda.is_available())
device = "cpu"
if torch.cuda.is_available():
    device = "cuda"


/home/alexey/Git/AMATH515/submodules
GPU Available? False


### Load Tiny Shakespeare Dataset

In [2]:

# To download tiny-shakespeare, go to https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt and put it in the "Datasets" folder
mingpt_config = helpers.get_minGPT_config()
with open(os.path.join(file_dir, 'Datasets', 'tiny-shakespeare.txt'), 'r') as file:
    tiny_shakespeare = file.read()
train_dataset = CharDataset(mingpt_config.data, tiny_shakespeare)


data has 1115394 characters, 65 unique.


### Configure model and trainer

In [3]:

# construct the model
mingpt_config.model.vocab_size = train_dataset.get_vocab_size()
mingpt_config.model.block_size = train_dataset.get_block_size()
model = GPT(mingpt_config.model)

# prepare for training otherwise
# construct the trainer object
mingpt_config.trainer.max_iters = pkg_config['mingpt_iters']
mingpt_config.trainer.optimizer_str = pkg_config['mingpt_SGD_method']
mingpt_config.trainer.scheduler_str = pkg_config['mingpt_Scheduler']
trainer = Trainer(mingpt_config.trainer, model, train_dataset)

# construct callback function
callback = lambda x: training.min_gpt_batch_end_callback(model, mingpt_config, train_dataset, x)
trainer.set_callback('on_batch_end', callback)



number of parameters: 2.71M
running on device cpu



### GPT Training


In [4]:

# run the optimization
trainer.run()


Iteration 0, lr 0.000500
---------------------------------
Saving model to "/home/alexey/Git/AMATH515/Saved_Models/mingpt.ckpt"


In [5]:

print(trainer.losses)
print(trainer.iter_num)


[4.224494457244873]
1
