# nanoGPT

This notebook is based on https://github.com/karpathy/nanoGPT.

The **reproducing GPT-2** part is not taken as it requires too much resources and takes long time.

### Preparation

In [2]:
!apt-get update && apt-get -y install gcc

Hit:1 http://archive.ubuntu.com/ubuntu focal InRelease                         
Get:2 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]      
Hit:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease                 
Hit:4 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal InRelease           
Get:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]      
Hit:6 https://deb.nodesource.com/node_14.x focal InRelease                     
Fetched 222 kB in 2s (94.4 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
gcc is already the newest version (4:9.3.0-1ubuntu2).
0 upgraded, 0 newly installed, 0 to remove and 149 not upgraded.


In [3]:
pip install torch==2.0 numpy transformers datasets tiktoken wandb tqdm

Note: you may need to restart the kernel to use updated packages.


In [4]:
!rm -rf nanoGPT && git clone https://github.com/karpathy/nanoGPT

Cloning into 'nanoGPT'...
remote: Enumerating objects: 649, done.[K
remote: Total 649 (delta 0), reused 0 (delta 0), pack-reused 649[K
Receiving objects: 100% (649/649), 935.29 KiB | 1.47 MiB/s, done.
Resolving deltas: 100% (374/374), done.


### Training

In [5]:
cd nanoGPT/

/home/jovyan/nanoGPT


Download the data as a single (1MB) file and turn it from raw text into one large stream of integers. This creates a train.bin and val.bin in that data/shakespeare_char directory. Now it is time to train your GPT. The size of it very much depends on the computational resources of your system:

In [6]:
!python data/shakespeare_char/prepare.py

length of dataset in characters: 1,115,394
all the unique characters: 
 !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
vocab size: 65
train has 1,003,854 tokens
val has 111,540 tokens


In [7]:
ls data/shakespeare_char/

input.txt  meta.pkl  prepare.py  readme.md  train.bin  val.bin


In [8]:
!python train.py config/train_shakespeare_char.py

Overriding config with config/train_shakespeare_char.py:
# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such

out_dir = 'out-shakespeare-char'
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

# we expect to overfit on this small dataset, so only save when val improves
always_save_checkpoint = False

wandb_log = False # override via command line if you like
wandb_project = 'shakespeare-char'
wandb_run_name = 'mini-gpt'

dataset = 'shakespeare_char'
gradient_accumulation_steps = 1
batch_size = 64
block_size = 256 # context of up to 256 previous characters

# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

learning_rate = 1e-3 # with baby networks can afford to go a bit higher
max_iters = 5000
lr_decay_iters = 5000 # make equal to max_iters usually
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of 

### Sample/Inference

Once the training finishes we can sample from the best model by pointing the sampling script at this directory:

In [9]:
!python sample.py --out_dir=out-shakespeare-char

Overriding: out_dir = out-shakespeare-char
number of parameters: 10.65M
Loading meta from data/shakespeare_char/meta.pkl...


ANGELO:
And cowards it be straighted but our hands.
3 KING EDWARD IV:
What, uncle? therefore thou enjoy'st us not the more
When we have finity to my foul ancient reels,
I hear thee in our soul. I know not.

Gaoler:
I am a world-husband with a cheeking here,
To seek thy foot so flesh in the demand.

KING EDWARD IV:
But we are done with us in his soul:
The rest was my brother son; which must with all of woes
Every gross corruptuous strength:
Are all comes to be in courtesy?

Son:
It is a man are 
---------------

Menenius, and graves your gatesmen have more.

AUFIDIUS:
The matter?

CORIOLANUS:
There is my wife.

MENENIUS:
I have look'd it bounds in the season.

BRUTUS:
We had as committed to come too so undoubtled as the
servant: I have sent the violence.

COMINIUS:
Nor bear the world that all to the very whole than
I have consul, come I see thee in the people's n

### Fine-tune

Get data first.

In [10]:
cd data/shakespeare/

/home/jovyan/nanoGPT/data/shakespeare


In [11]:
!python prepare.py

train has 301,966 tokens
val has 36,059 tokens


In [12]:
cd ../..

/home/jovyan/nanoGPT


Finetuning is no different than training, we just make sure to initialize from a pretrained model and train with a smaller learning rate. 

Fine-tune a gpt2 model. If you're running out of memory, the process may stop silently, try decreasing the model size (they are {'gpt2', 'gpt2-medium', 'gpt2-large', 'gpt2-xl'}). Change in config/finetune_shakespeare.py

In [13]:
!python train.py config/finetune_shakespeare.py

Overriding config with config/finetune_shakespeare.py:
import time

out_dir = 'out-shakespeare'
eval_interval = 5
eval_iters = 40
wandb_log = False # feel free to turn on
wandb_project = 'shakespeare'
wandb_run_name = 'ft-' + str(time.time())

dataset = 'shakespeare'
init_from = 'gpt2-medium' # this is the largest GPT-2 model

# only save checkpoints if the validation loss improves
always_save_checkpoint = False

# the number of examples per iter:
# 1 batch_size * 32 grad_accum * 1024 tokens = 32,768 tokens/iter
# shakespeare has 301,966 tokens, so 1 epoch ~= 9.2 iters
batch_size = 1
gradient_accumulation_steps = 32
max_iters = 20

# finetune at constant LR
learning_rate = 3e-5
decay_lr = False

tokens per iteration will be: 32,768
Initializing from OpenAI GPT-2 weights: gpt2-medium
[2023-08-25 06:30:22,719] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
loading weights from pretrained gpt: gpt2-medium
forcing vocab_size=50257, block_size=

In [14]:
!python sample.py --out_dir=out-shakespeare

Overriding: out_dir = out-shakespeare
number of parameters: 353.77M
No meta.pkl found, assuming GPT-2 encodings...

This time, the young man came to the door of the house and asked for bread; and he, receiving it, went to work with his hands to make it.

And he struck off a piece of bread, and made into four equal portions, and put it before the king; and he, receiving it, set it upon the table; and he, having received the two pieces of bread, received the other.

And the king was greatly satisfied; and said:
'You shall have three pieces of bread, for I have two pieces of bread.

And in the first piece of bread you shall have an oat for your neighbour's daughter; in the second piece of bread ye shall have three white barley-cake cakes.

And in the second piece of bread you shall have one white bread cake; and in the third, a piece of taffy.

And in the fourth piece of bread you shall have two pieces of white bread cakes; and in the fifth, two pieces of barley-cake cakes; and in the six