Skip to content

wrmedford/llm720

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM720

LLM720 is a second generation Large Language Model that aims to be:

  1. Open
  2. Interpretable
  3. Energy efficient

These goals are all accomplished through the same guiding design: Make a model that is as sparse as possible, so you can see exactly which weights are contributing to output while simultaneously not using more compute than necessary.

We aim to accomplish this through a fine grained Mixture of Experts architecture (He, 2024), while utilizing efficient attention mechanisms initially pioneered by DeepSeek with Multi-Headed Latent Attention (Deepseek, 2024). This project is also exploring expanding model weights into system memory while maintaining performance with the hope of making frontier models less bound by GPU memory capacity by expanding on the Mixture of a Million Experts architecture originally developed by He.

LLM720 gets its namesake from LLM360, where we intend to carry the torch of completely open-sourced model development (Liu et al, 2023).

Quick Start

Join us on Discord

Special Thanks

Thank you to @lambdal for providing compute for this project.

About

Second Generation of Large Language Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors