moe

ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.

lm moe

Updated Apr 10, 2024
Python

SkyworkAI / MoH

Star

MoH: Multi-Head Attention as Mixture-of-Head Attention

transformer moe attention vit dit mixture-of-experts llms

Updated Oct 29, 2024
Python

SkyworkAI / MoE-plus-plus

Star

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

moe mixture-of-experts large-language-models llms

Updated Oct 16, 2024
Python

simplifine-llm / Simplifine

Star

🚀 Easy, open-source LLM finetuning with one-line commands, seamless cloud integration, and popular optimization frameworks. ✨

open-source cloud ai moe llama gpt lora mistral phi fine-tuning peft large-language-models llm instruction-tuning llm-training finetuning-llms fine-tuning-llm qwen llama3

Updated Aug 14, 2024
Python

kyegomez / MoE-Mamba

Sponsor

Star

Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

ai ml moe swarms multi-modality multi-modal-fusion

Updated Nov 11, 2024
Python

xrsrke / pipegoose

Star

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

transformers moe data-parallelism distributed-optimizers model-parallelism megatron mixture-of-experts pipeline-parallelism huggingface-transformers megatron-lm tensor-parallelism large-scale-language-modeling 3d-parallelism zero-1 sequence-parallelism

Updated Dec 14, 2023
Python

phanirithvij / twist.moe

Star

Batch download high quality videos from https://twist.moe

anime moe anime-downloader twist-moe twist-moe-downloader

Updated Sep 30, 2023
Python

kyegomez / SwitchTransformers

Sponsor

Star

Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

ai ml moe llama multi-modal mixture-model mixture-of-experts mixture-of-models gpt4

Updated Nov 11, 2024
Python

LINs-lab / DynMoE

Star

[Preprint] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

moe language-model mixture-of-experts adaptive-computation dynamic-neural-network multimodal-large-language-models

Updated Aug 21, 2024
Python

VITA-Group / Random-MoE-as-Dropout

Star

[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal, Shiwei Liu, Zhangyang Wang

transformer dropout moe self-slimmable

Updated Feb 28, 2023
Python

facebookresearch / AdaTT

Star

pytorch open-source library for the paper "AdaTT Adaptive Task-to-Task Fusion Network for Multitask Learning in Recommendations"

moe mtl multitask-learning

Updated Aug 9, 2024
Python

Improve this page

Add a description, image, and links to the moe topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the moe topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

moe

Here are 41 public repositories matching this topic...

hiyouga / LLaMA-Factory

sgl-project / sglang

PKU-YuanGroup / MoE-LLaVA

davidmrau / mixture-of-experts

pjlab-sys4nlp / llama-moe

open-compass / MixtralKit

sail-sg / Adan

microsoft / Tutel

ymcui / Chinese-Mixtral

IBM / ModuleFormer

SkyworkAI / MoH

SkyworkAI / MoE-plus-plus

simplifine-llm / Simplifine

kyegomez / MoE-Mamba

xrsrke / pipegoose

phanirithvij / twist.moe

kyegomez / SwitchTransformers

LINs-lab / DynMoE

VITA-Group / Random-MoE-as-Dropout

facebookresearch / AdaTT

Improve this page

Add this topic to your repo