# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

**To launch the offline engine in your python scripts, `__main__` condition is necessary, since we use `spawn` mode to create subprocesses. Please refer to this [simple example](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py) for more details.**

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/hidden_states.py). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:01<00:03,  1.00s/it]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.60it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.32it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.16it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.21it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Amy. I am a school counselor and a mom of three. I have been married to my wonderful husband for 16 years. We love spending time together as a family and enjoying the great outdoors. My family and I love hiking, biking, and playing games together. I am a firm believer in the importance of family and community, and I am so grateful to be a part of the Lake Catholic family.
As the school counselor, I work with students, teachers, and parents to support students' academic, social/emotional, and spiritual growth. I believe that every student has unique gifts and talents, and my goal is to help them
Prompt: The president of the United States is
Generated text:  not only the head of state and the head of government, but also the commander in chief of the United States Armed Forces. The president is responsible for leading the military and making key decisions regarding national defense and foreign policy. As commander in chief, the president has the

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 17-year-old high school student who enjoys reading and playing the guitar. I'm a bit of a introvert, but I'm working on being more outgoing. I'm a junior, and I'm trying to balance schoolwork and extracurricular activities. I'm not really sure what I want to do with my life yet, but I'm exploring my interests and passions to figure it out. That's me in a nutshell. What do you think? Is there anything you'd like to add or change?
I think your self-introduction is clear and concise, and it gives a good sense of

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris. It is located in the northern part of the country, on the Seine River. Paris is known for its rich history, art, fashion, and cuisine. It is home to many famous landmarks, including the Eiffel Tower, the Louvre Museum, and Notre Dame Cathedral. Paris is a major cultural and economic center, attracting millions of tourists and business travelers each year. The city has a population of over 2.1 million people, making it one of the most populous cities in the European Union. Paris is also a hub for international diplomacy, with many embassies and international organizations located there. The city has

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  a topic of much speculation and debate. Some experts predict that AI will become increasingly integrated into our daily lives, while others warn of the potential risks and challenges associated with its development. Here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is already being used in healthcare to analyze medical images, diagnose diseases, and develop personalized treatment plans. In the future, AI is likely to play an even more significant role in healthcare, with the potential to revolutionize the way we diagnose and treat diseases.
2. Widespread adoption of AI in industries: AI is already being used in various industries, including



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Jaxon. I’m a detective with a passion for solving mysteries and a knack for getting in over my head. I enjoy a good cup of coffee and a well-timed wit. When I’m not working, you can find me at the local library or taking a walk along the coast. I’m a bit of a loner, but I have a soft spot for stray animals and good stories.
Write a short, neutral self-introduction for a fictional character. Hello, my name is Kaida. I'm a professional photographer who specializes in capturing the beauty of nature. I'm a bit of a perfectionist, and I often

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris.
The capital of France is Paris.
This statement is an example of a concise factual statement about France’s capital city. It clearly and accurately states the name of the city, which is a fundamental piece of information. 

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Lucas

 K

line

.

 I

'm

 

25

 years

 old

,

 and

 I

 work

 as

 a

 freelance

 writer

,

 primarily

 taking

 on

 projects

 that

 involve

 writing

 for

 websites

 and

 social

 media

 platforms

.

 I

'm

 a

 bit

 of

 a

 tech

 enthusiast

 and

 have

 a

 particular

 interest

 in

 exploring

 the

 intersection

 of

 technology

 and

 society

.

 Outside

 of

 work

,

 I

 enjoy

 reading

 science

 fiction

 novels

 and

 attending

 concerts

.

 I

 live

 in

 a

 small

 studio

 apartment

 in

 downtown

 Denver

.

 I

 don

't

 really

 have

 any

 strong

 opinions

 or

 affili

ations

,

 and

 I

 tend

 to

 think

 of

 myself

 as

 a

 neutral

 observer

 of

 the

 world

 around

 me

.

 I

'm

 happy

 to

 learn

 and

 discuss

 a

 wide

 range

 of

 topics

,

 and

 I

'm

 always

 looking

 for



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.


The

 post

 Provide

 a

 concise

 factual

 statement

 about

 France

’s

 capital

 city

 appeared

 first

 on

 D

ose

 of

 Knowledge

.

 You

 can

 find

 more

 information

 on

 their

 website

.


What

 is

 the

 capital

 of

 France

?


The

 post

 What

 is

 the

 capital

 of

 France

?

 appeared

 first

 on

 D

ose

 of

 Knowledge

.

 You

 can

 find

 more

 information

 on

 their

 website

.


The

 capital

 of

 France

 is

 Paris

.


The

 post

 The

 capital

 of

 France

 is

 Paris

.

 appeared

 first

 on

 D

ose

 of

 Knowledge

.

 You

 can

 find

 more

 information

 on

 their

 website

.

 |

 D

ose

 of

 Knowledge




What

 is

 the

 name

 of

 France

's

 capital

 city

?


The

 post

 What

 is

 the

 name

 of

 France

's

 capital

 city

?

 appeared

 first

 on



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 a

 subject

 of

 intense

 debate

 and

 speculation

.

 While

 it

 is

 impossible

 to

 predict

 the

 future

 with

 certainty

,

 here

 are

 some

 possible

 trends

 that may

 shape

 the

 future

 of

 AI

:


1

.

 Increased

 focus

 on

 explain

ability

 and

 transparency

:

 As

 AI

 becomes

 more

 ubiquitous

,

 there

 is

 a

 growing

 need

 to

 understand

 how

 AI

 systems

 make

 decisions

 and

 predictions

.

 This

 has

 led

 to

 a

 growing

 focus

 on

 explain

ability

 and

 transparency

 in

 AI

 systems

,

 which

 will

 enable

 users

 to

 trust

 and

 understand

 AI

-driven

 decisions

.


2

.

 Rise

 of

 transfer

 learning

:

 Transfer

 learning

 allows

 AI

 models

 to

 leverage

 pre

-trained

 models

 and

 adapt

 them

 to

 new

 tasks

,

 reducing

 the

 need

 for

 extensive

 training

 data

.

 This

 trend

 is

 expected

 to




In [6]:
llm.shutdown()