Scratch LLM Tutorial

refenrence: https://github.com/HongLabInc/HongLabLLM/blob/main/pretraining.ipynb

Introduction

홍정모 교수님의 유튜브 강의를 참고하여 작성하였습니다.
Harry Potter와 관련된 데이터를 이용하여 LLM을 학습시키고, 이를 이용하여 다음 단어를 예측하는 모델을 만들어보겠습니다.
pytorch만을 사용해 기본적인 transformer 구조를 반복하여 간단한 LLM을 구축했습니다.
구축 환경은 가벼운 로컬 환경으로 모델의 크기, 데이터의 양 모두를 줄여서 학습시켰습니다.
학습 결과 및 모델의 성능은 노트북을 참고해주세요.

Environment

Python 3.12.9
Intel(R) Core(TM) Ultra 5 125U 1.30 GHz
32.0GB(31.5GB 사용 가능)
No GPU
Windows 11 Pro with WSL2

Data

데이터는 Kaggle에서 제공하는 Harry Potter 데이터를 사용하였습니다.
학습의 편의를 위해 2권의 데이터만 사용하였습니다.

Architecture

Model Architecture

GPT2의 기본 구조를 따라 transformer를 반복하여 구축하였습니다. - Language Models are Unsupervised Multitask Learners
TransformerBlock 구조는 다음과 같습니다.
- MultiHeadAttention: embedding된 input을 Q, K, V로 나누어 attention을 계산합니다. embeing된 input을 넣어야 합니다.
- Feed Forward Network
  - GELU: ReLU대신 GELU 사용. - Gaussian Error Linear Units (GELUs)
- LayerNorm
- LayerNorm
- Dropout
GPT 모델은 TransformerBlock을 반복하여 구축하였습니다.
GPT 모델은 다음과 같은 구조를 가집니다.
- Embedding: input을 embedding합니다.
- PositionalEncoding: embedding된 input에 positional encoding을 더합니다.
- TransformerBlock: 반복하여 구축합니다.
- Linear: output을 embedding된 input의 크기로 변환합니다.

Parameters

VOCAB_SIZE = 50257
CONTEXT_LENGHT = 32
EMB_DIM = 128
NUM_HEADS = 4
NUM_LAYERS = 4
DROP_RATE = 0.1
QKV_BIAS = False

Total parameters: 13M (13,661,696)

GPTModel(
  (tok_emb): Embedding(50257, 128)
  (pos_emb): Embedding(32, 128)
  (drop_emb): Dropout(p=0.1, inplace=False)
  (trf_blocks): Sequential(
    (0): TransformerBlock(
      (att): MultiHeadAttention(
        (W_query): Linear(in_features=128, out_features=128, bias=False)
        (W_key): Linear(in_features=128, out_features=128, bias=False)
        (W_value): Linear(in_features=128, out_features=128, bias=False)
        (out_proj): Linear(in_features=128, out_features=128, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (ff): FeedForward(
        (layers): Sequential(
          (0): Linear(in_features=128, out_features=512, bias=True)
          (1): GELU()
          (2): Linear(in_features=512, out_features=128, bias=True)
        )
      )
      (norm1): LayerNorm()
      (norm2): LayerNorm()
      (drop_shortcut): Dropout(p=0.1, inplace=False)
    )
    (1): TransformerBlock(
      (att): MultiHeadAttention(
        (W_query): Linear(in_features=128, out_features=128, bias=False)
        (W_key): Linear(in_features=128, out_features=128, bias=False)
        (W_value): Linear(in_features=128, out_features=128, bias=False)
        (out_proj): Linear(in_features=128, out_features=128, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (ff): FeedForward(
        (layers): Sequential(
          (0): Linear(in_features=128, out_features=512, bias=True)
          (1): GELU()
          (2): Linear(in_features=512, out_features=128, bias=True)
        )
      )
      (norm1): LayerNorm()
      (norm2): LayerNorm()
      (drop_shortcut): Dropout(p=0.1, inplace=False)
    )
    (2): TransformerBlock(
      (att): MultiHeadAttention(
        (W_query): Linear(in_features=128, out_features=128, bias=False)
        (W_key): Linear(in_features=128, out_features=128, bias=False)
        (W_value): Linear(in_features=128, out_features=128, bias=False)
        (out_proj): Linear(in_features=128, out_features=128, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (ff): FeedForward(
        (layers): Sequential(
          (0): Linear(in_features=128, out_features=512, bias=True)
          (1): GELU()
          (2): Linear(in_features=512, out_features=128, bias=True)
        )
      )
      (norm1): LayerNorm()
      (norm2): LayerNorm()
      (drop_shortcut): Dropout(p=0.1, inplace=False)
    )
    (3): TransformerBlock(
      (att): MultiHeadAttention(
        (W_query): Linear(in_features=128, out_features=128, bias=False)
        (W_key): Linear(in_features=128, out_features=128, bias=False)
        (W_value): Linear(in_features=128, out_features=128, bias=False)
        (out_proj): Linear(in_features=128, out_features=128, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (ff): FeedForward(
        (layers): Sequential(
          (0): Linear(in_features=128, out_features=512, bias=True)
          (1): GELU()
          (2): Linear(in_features=512, out_features=128, bias=True)
        )
      )
      (norm1): LayerNorm()
      (norm2): LayerNorm()
      (drop_shortcut): Dropout(p=0.1, inplace=False)
    )
  )
  (final_norm): LayerNorm()
  (out_head): Linear(in_features=128, out_features=50257, bias=False)
)

Training

100 epoch 학습

Evaluation

start context: "Dobby is"

10개의 예측 문장:

0 : Dobby is a few good compliments at dinner. Petunia, any ideas?” “Vernon tells me you’re a wonderful golfer, let out.” He exchanged dark looks with his toast. “I’ll
1 : Dobby is a wizard — a wizard fresh from his first year at Hogwarts School of Witchcraft and Wizardry. And if the Dursleys were unhappy to have to have him all. Harry had spent all summer holidays, even though Ron and he had been talking
2 : Dobby is a very mysterious past, of the reason he had been left on the Dursleys’ doorstep eleven years before. At the age of one year ago, and top to, and somehow school food. Harry had been brought up his dead mother�
3 : Dobby is a letter, but it wasn’t worth the risk. Underage wizards weren’t allowed to use magic outside of school. Harry hadn’t told the Dursleys, veins), the Dursleys, veins), and sending
4 : Dobby is a few good compliments at dinner. Petunia, any ideas?” “Vernon tells me you’re a wonderful golfer, let out.” He exchanged dark looks with his toast. “Vernon tells
5 : Dobby is a few good compliments at dinner. Petunia, any ideas?” “Vernon tells me you’re a wonderful golfer, Mr. Mason?” “Vernon tells me you, no noise and pretending
6 : Dobby is a few good compliments at dinner. Petunia, any ideas?” “Vernon tells me you’re a wonderful golfer, the family was a matter of this scar in Majorca, he knew it was talking about the
7 : Dobby is a few good compliments at dinner. Petunia, any ideas?” “Vernon tells me you’re a wonderful golfer, I wrote about you, even though Ron Weasley and Wizard to me.” said Harry tried
8 : Dobby is a very important day.” Harry looked up, hardly daring to believe it. “This could well be the day I make the biggest deal of Harry had rolled in Majorca,’s attack,” said Uncle Vernon. Harry
9 : Dobby is a very important day.” Harry looked up, hardly daring to believe it. “This could well be the day I make the biggest deal signed and sending her massive son, while Harry. “Third time to get a brilliant, let

Conclusion

간단한 LLM을 구축하였습니다.
생각보다 LLM을 구축하는 일이 어렵지 않았습니다.
약간의 인프라만 준비 된다면 더 큰 모델을 학습시키고, 더 많은 데이터를 사용하여 더 좋은 성능을 낼 수 있을 것입니다.
따라서 미래에는 개별 기업이 LLM을 가질 수 있을 것으로 생각됩니다.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
.gitignore		.gitignore
README.md		README.md
pretraining.ipynb		pretraining.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scratch LLM Tutorial

Introduction

Environment

Data

Architecture

Model Architecture

Parameters

Training

Evaluation

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Languages

taejongK/scratch-llm-tutorial

Folders and files

Latest commit

History

Repository files navigation

Scratch LLM Tutorial

Introduction

Environment

Data

Architecture

Model Architecture

Parameters

Training

Evaluation

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages