This repository contains a reimplementation of OpenAI's GPT-2 model from scratch. The goal is to understand and reproduce the core functionalities of GPT-2, including tokenization, transformer architecture, training, and inference.
GPT-2 is an autoregressive Transformer model designed for text generation. It consists of:
- Multi-layer Transformer blocks
- Self-attention for contextual word understanding
- Layer normalization and residual connections
- Token embeddings and positional encodings
We use the FineWeb-Edu dataset from Hugging Face for training. This dataset consists of high-quality web text specifically filtered for educational purposes. You can find it here: FineWeb-Edu