slime

中文版

slime is an LLM post-training framework aiming at RL scaling, providing two core capabilities:

High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;
Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.

Architecture Overview

Module Descriptions:

training (Megatron): Responsible for the main training process, reads data from the Data Buffer, and synchronizes parameters to the rollout module after training.
rollout (SGLang + router): Generates new data (including rewards/verifier outputs) and stores it in the Data Buffer.
data buffer: A bridge module that manages prompt initialization, custom data, and rollout generation methods.

Quick Start

Environment Setup

Based on the zhuzilin/slime:latest image (pre-installed with SGLang 0.4.7 and Megatron):

docker run --rm --gpus all --ipc=host --shm-size=16g \
  --ulimit memlock=-1 --ulimit stack=67108864 \
  -it zhuzilin/slime:latest /bin/bash

git clone https://github.com/THUDM/slime.git
cd slime
pip install -e .

We are collaborating with the SGLang community to merge the updates in sglang.patch into the main branch.

Example: GLM4-9B and Qwen3-4B

We provide examples to use GLM-4-9B and Qwen3-4B, please refer to:

Checkpoint Format Conversion

Since slime uses Megatron, and Megatron does not support loading Hugging Face checkpoints directly, we need to convert the model to the torch_dist format that Megatron supports.

HF → Megatron torch_dist ckpt

Use mbridge for conversion:

cd slime/
PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
    --hf-checkpoint /root/GLM-Z1-9B-0414 \
    --save /root/GLM-Z1-9B-0414_torch_dist

⚠️ If you encounter an issue where slime cannot be found, please run pip install -e . in the slime directory.

Megatron torch_dist → HF ckpt

To convert a torch_dist checkpoint saved during training back to a Hugging Face checkpoint:

cd slime/
PYTHONPATH=/root/Megatron-LM python tools/convert_torch_dist_to_hf.py \
  --input-dir /path/to/torch_dist_ckpt/iter_xxx/ \
  --output-dir /root/GLM-Z1-9B-0414-iter_xxx \
  --origin-hf-dir /root/GLM-Z1-9B-0414

⚠️ Since the torch_dist checkpoint converted by mbridge does not currently save args, you cannot convert the checkpoint from the previous step back to HF format.

Any Megatron ckpt → HF

Applicable for custom save formats (e.g., --ckpt-format torch).

The principle behind this conversion method is to reuse the function that updates parameters from Megatron to SGLang during training. This means reusing the training script and changing the original command from:

ray job submit --address="http://127.0.0.1:8265" \
   --runtime-env-json='{
     "env_vars": { ...}
   }' \
   -- python3 train.py \
   ... # Other training args

To:

torchrun --nproc_per_node ${NUM_GPU} tools/convert_to_hf.py \
   --load /your/saved/megatron_ckpt \
   --output-dir /your/converted/hf_ckpt \
   ... # Other training args

That is, keep all other arguments the same, and:

Change the task launcher from ray to torchrun. Set the number of GPUs to the minimum required for Megatron's parallelism without data parallelism (DP). For example, if you are using tp4, set it to 4.
Make sure to change --load to the path of the checkpoint you want to load.
Add the --output-dir argument to specify where the converted Hugging Face checkpoint should be saved.

Starting the Training Process

The entire program needs to be launched using Ray. First, you need to start a Ray cluster. On node 0, run:

# Node0 (HEAD)
ray start --head --node-ip-address ${MASTER_ADDR} \
  --num-gpus 8 --disable-usage-stats

# Other Nodes
ray start --address=${MASTER_ADDR}:6379 --num-gpus 8

After the Ray cluster has started, you can submit a job from node 0, for example:

ray job submit --address="http://127.0.0.1:8265" \
   --runtime-env-json='{
     "env_vars": {
        "PYTHONPATH": "/root/Megatron-LM/",
        ... # e.g., no_proxy, API variables, etc.
     }
   }' \
   -- python3 train.py \
   --... # Other Megatron/SGLang/slime arguments

Argument Descriptions

Arguments are divided into three categories:

Megatron arguments: slime reads all arguments set in Megatron via PYTHONPATH. You can configure Megatron by passing arguments like --tensor-model-parallel-size 2.
SGLang arguments: All arguments for the installed SGLang are supported. These arguments must be prefixed with --sglang-. For example, --mem-fraction-static should be passed as --sglang-mem-fraction-static.
slime-specific arguments: Please refer to: slime/utils/arguments.py

For complete usage instructions, please refer to the Usage Documentation.

Developer Guide

Contributions are welcome! If you have suggestions for new features, performance tuning, or feedback on user experience, feel free to submit an Issue or PR 😊
Use pre-commit to ensure code style consistency for your commits:
```
apt install pre-commit -y
pre-commit install
```
For debugging tips, please refer to the Debugging Guide

FAQ & Acknowledgements

For frequently asked questions, please see the Q&A
Special thanks to the following projects & communities: SGLang, Megatron‑LM, mbridge, OpenRLHF, veRL, and others.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
docker		docker
docs		docs
imgs		imgs
scripts		scripts
slime		slime
slime_plugins		slime_plugins
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

slime

Table of Contents

Architecture Overview

Quick Start

Environment Setup

Example: GLM4-9B and Qwen3-4B

Checkpoint Format Conversion

HF → Megatron torch_dist ckpt

Megatron torch_dist → HF ckpt

Any Megatron ckpt → HF

Starting the Training Process

Argument Descriptions

Developer Guide

FAQ & Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

THUDM/slime

Folders and files

Latest commit

History

Repository files navigation

slime

Table of Contents

Architecture Overview

Quick Start

Environment Setup

Example: GLM4-9B and Qwen3-4B

Checkpoint Format Conversion

HF → Megatron torch_dist ckpt

Megatron torch_dist → HF ckpt

Any Megatron ckpt → HF

Starting the Training Process

Argument Descriptions

Developer Guide

FAQ & Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages