GPT-2 Reimplementation

📌 Project Overview

This repository contains a reimplementation of OpenAI's GPT-2 model from scratch. The goal is to understand and reproduce the core functionalities of GPT-2, including tokenization, transformer architecture, training, and inference.

🧠 Understanding GPT-2

GPT-2 is an autoregressive Transformer model designed for text generation. It consists of:

Multi-layer Transformer blocks
Self-attention for contextual word understanding
Layer normalization and residual connections
Token embeddings and positional encodings

Dataset

We use the FineWeb-Edu dataset from Hugging Face for training. This dataset consists of high-quality web text specifically filtered for educational purposes. You can find it here: FineWeb-Edu

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.idea		.idea
gpt_reimp_EFE		gpt_reimp_EFE
gpt_reimp_MAIN		gpt_reimp_MAIN
gpt_reimp_MERT		gpt_reimp_MERT
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPT-2 Reimplementation

📌 Project Overview

🧠 Understanding GPT-2

Dataset

📚 References

🏆 Contributors

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

efeecllk/gpt-reimplementation

Folders and files

Latest commit

History

Repository files navigation

GPT-2 Reimplementation

📌 Project Overview

🧠 Understanding GPT-2

Dataset

📚 References

🏆 Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages