Refactor: Decouple Core Transformer Blocks #1852

parambole · 2025-06-19T18:09:34Z

TL;DR

What: This PR refactors the core DecoderLayer and its related components out of layers/models.py and into a new, foundational file: layers/blocks.py.
Why: To improve the overall code architecture and break potential circular dependencies. This is a necessary prerequisite for adding new, complex modules that also need access to these core building blocks.
How: By creating layers/blocks.py to house the fundamental Decoder and DecoderLayer classes. Higher-level files like models.py and other future modules now import these components from a single location.

Detailed Description

This pull request introduces a structural refactoring to improve modularity and maintainability.

The primary change is the creation of MaxText/layers/blocks.py, which now serves as the source for fundamental building blocks of the Transformer architecture, such as:

DecoderLayer
Decoder

Previously, these classes were located in layers/models.py, which created tight coupling. As we add more features this tight coupling would lead to circular import dependencies.

By decoupling these core components, we establish a clearer hierarchy in the codebase, where high-level modules can depend on these "building blocks" without depending on each other.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

Refactor: Decouple Decoder into layers/blocks.py

2e56b95

parambole force-pushed the parambole/mtp_refactor branch from c7e43e0 to 2e56b95 Compare June 19, 2025 18:59

parambole changed the title ~~Refactor: Decouple Decoder into layers/blocks.py~~ Refactor: Decouple Core Transformer Blocks Jun 19, 2025

parambole mentioned this pull request Jun 19, 2025

Integrate Multi-Token Prediction (MTP) Training objective #1837

Open

4 tasks

Fixing Import

6ffbbe6

parambole marked this pull request as ready for review June 19, 2025 19:48

parambole requested review from RissyRan, gagika, richjames0, gobbleturk, khatwanimohit, bvandermoon, vipannalla, shralex, yangyuwei, SurbhiJainUSC, hengtaoguo, A9isha and aireenmei as code owners June 19, 2025 19:48

parambole assigned RissyRan and gobbleturk Jun 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: Decouple Core Transformer Blocks #1852

Refactor: Decouple Core Transformer Blocks #1852

parambole commented Jun 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Refactor: Decouple Core Transformer Blocks #1852

Are you sure you want to change the base?

Refactor: Decouple Core Transformer Blocks #1852

Conversation

parambole commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Detailed Description

Checklist

Uh oh!

Uh oh!

parambole commented Jun 19, 2025 •

edited

Loading