GTAlign: Game-Theoretic Alignment of LLM Assistants for Social Welfare

GTAlign applies game-theoretic principles to fine-tune reasoning LLMs, encouraging them to make decisions that are not only accurate but also rational, cooperative, and transparent in dialogue settings.

Getting Started

Installation

# create a new environment
conda create -n gtalign python==3.10
conda activate gtalign

# Install dependencies
USE_MEGATRON=0 bash scripts/install_vllm_sglang_mcore.sh
pip install -r requirements.txt

# Install GTAlign
git clone https://github.com/ulab-uiuc/GTAlign.git
cd GTAlign
pip install --no-deps -e .

Experiments

We recommend using eight NVIDIA H20 GPUs for our experiments.

Launch LLM judge using sglang:

## get ipv6/ipv4 address
cat /etc/hosts

## run judge model on a node different from verl node, using ipv6.
export CUDA_DEVICE_MAX_CONNECTIONS=1
export NCCL_SOCKET_IFNAME=eth0
export NCCL_IB_GID_INDEX=0
export NCCL_IB_DISABLE=1
export NCCL_NET_GDR_LEVEL=2
export NCCL_IB_QPS_PER_CONNECTION=4
export NCCL_IB_TC=160
export NCCL_IB_TIMEOUT=22
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 -m sglang.launch_server --model-path Qwen/Qwen3-32B --host '::' --mem-fraction-static 0.8 --port port_number --tp 1 --dp 8

# Test on verl node
curl -v "http://[ipv6]:port_number/v1/models"

# Call from verl node (HTTP/URL with brackets)
curl -X POST "http://[ipv6]:port_number/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model":"default","messages":[{"role":"user","content":"hello"}]}'

## serve judge model on the same node as the verl node.
CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m sglang.launch_server --model-path Qwen/Qwen3-32B --mem-fraction-static 0.8 --port port_number --tp 4 --dp 1 --reasoning-parser qwen3

Run RL training:

you need to change the model and data path in the scripts.

# verl node
bash recipe/gamellm/math.sh
bash recipe/gamellm/medium.sh
bash recipe/gamellm/adgqa.sh
bash recipe/gamellm/wildguard.sh
bash recipe/gamellm/full_zscore.sh

Run OOD evaluation:

you need to change the model and data path in the scripts.

# transform fsdp checkpoint to huggingface checkpoint
python -m verl.model_merger merge \
    --backend fsdp \
    --local_dir xxx \
    --target_dir xxx

# run evaluation scripts
bash recipe/gamellm/ablation_advbench.sh
bash recipe/gamellm/ablation_coqa.sh
bash recipe/gamellm/ablation_minerva.sh

Dataset

The dataset used in RL training and evaluation is at data/

Inference

We provide inference scripts at recipe/gamellm/inference/inference_v2.py. You could modify the payoff matrix during inference to control model behavior.

References

This project builds upon the following open-source repositories:

verl License: Apache License 2.0
collabllm License: MIT License

Citation

If you find our work helpful to your research, please cite as follows:

@misc{zhu2025gtaligngametheoreticalignmentllm,
      title={GTAlign: Game-Theoretic Alignment of LLM Assistants for Social Welfare}, 
      author={Siqi Zhu and David Zhang and Pedro Cisneros-Velarde and Jiaxuan You},
      year={2025},
      eprint={2510.08872},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2510.08872}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,351 Commits
data		data
data_process/download		data_process/download
docker		docker
docs		docs
examples		examples
recipe		recipe
scripts		scripts
tables		tables
tests		tests
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
call_sgl_across_node.py		call_sgl_across_node.py
login.py		login.py
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GTAlign: Game-Theoretic Alignment of LLM Assistants for Social Welfare

Getting Started

Installation

Experiments

Launch LLM judge using sglang:

Run RL training:

Run OOD evaluation:

Dataset

Inference

References

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GTAlign: Game-Theoretic Alignment of LLM Assistants for Social Welfare

Getting Started

Installation

Experiments

Launch LLM judge using sglang:

Run RL training:

Run OOD evaluation:

Dataset

Inference

References

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages