SGLang Server Management and Testing Utilities

Overview

This suite of scripts provides utilities for setting up an SGLang (Efficient Language Model Serving) environment, managing language models, launching an SGLang server, testing its functionality, and benchmarking its performance. These scripts are designed to streamline the process of working with various Hugging Face language models via SGLang, primarily on Linux-based systems with NVIDIA GPUs.

Key features include:

Automated environment setup including Python virtual environment, SGLang, CUDA checks, and other dependencies like nvitop.
Interactive model selection and download from Hugging Face.
Server launch with GPU auto-detection for tensor parallelism.
Interactive testing script with a large, customizable set of queries from queries.txt.
Parallel request batching for load testing.
Benchmark script for performance evaluation.

Below is an example of nvitop monitoring GPU usage while the test_sglang_model.sh script is sending requests to an SGLang server:

Key Scripts & Files

setup-sglang.sh: Handles initial environment setup, dependency installation (SGLang, nvitop, etc.), CUDA checks, and model downloads.
start-sglang.sh: Launches the SGLang server with a selected cached model and auto-detected GPU configuration.
test_sglang_model.sh: Interactively tests a running SGLang server using questions from queries.txt, with options for sequential or batched parallel requests.
benchmark.sh: Benchmarks a selected model by managing its own SGLang server instance.
model.txt: Plain text file to list Hugging Face model identifiers for download and use.
queries.txt: Plain text file containing questions for test_sglang_model.sh.
scripts/: Directory containing utility Python scripts (check_model_cached.py, detect_gpus.py).
docs/: Directory containing detailed documentation.
LICENSE: Contains the Apache License 2.0 for the project.

Quick Start / Basic Workflow

Populate model.txt: Add Hugging Face model IDs you want to use.
Run Setup:
```
chmod +x *.sh 
./setup-sglang.sh
```
Follow prompts to download models and install dependencies.
Start Server:
```
./start-sglang.sh
```
Select a downloaded model to launch the server.
Test Server (in a new terminal):
```
./test_sglang_model.sh
```
Select a model and execution mode.
Benchmark Server (optional, stops any running server started by start-sglang.sh if on the same port, as it manages its own):
```
./benchmark.sh
```

Detailed Documentation

For more in-depth information, please refer to the following documents in the docs/ directory:

Prerequisites: System and software requirements.
Detailed File Descriptions: In-depth explanation of each file.
Setup and Configuration Guide: Detailed setup steps.
Usage Instructions: Comprehensive guide on using each script.
Customization Guide: How to tailor scripts and queries.
Troubleshooting Guide: Solutions for common issues.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for full details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SGLang Server Management and Testing Utilities

Overview

Key Scripts & Files

Quick Start / Basic Workflow

Detailed Documentation

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
docs		docs
ollama		ollama
scripts		scripts
LICENSE		LICENSE
README.md		README.md
benchmark.sh		benchmark.sh
install-nvidia-drivers.sh		install-nvidia-drivers.sh
model.txt		model.txt
queries.txt		queries.txt
setup-sglang.sh		setup-sglang.sh
start-sglang.sh		start-sglang.sh
test_sglang_model.sh		test_sglang_model.sh

License

xynehq/gpu-scripts

Folders and files

Latest commit

History

Repository files navigation

SGLang Server Management and Testing Utilities

Overview

Key Scripts & Files

Quick Start / Basic Workflow

Detailed Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages