# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio

import sglang as sgl
import sglang.test.doc_patch
from sglang.utils import async_stream_and_merge, stream_and_merge

llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

[2026-02-13 07:57:54] INFO utils.py:148: Note: detected 128 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.


[2026-02-13 07:57:54] INFO utils.py:151: Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16.


[2026-02-13 07:57:54] INFO utils.py:164: NumExpr defaulting to 16 threads.




[2026-02-13 07:57:56] INFO server_args.py:1813: Attention backend not specified. Use fa3 backend by default.


[2026-02-13 07:57:56] INFO server_args.py:2821: Set soft_watchdog_timeout since in CI








[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0


Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.39it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.38it/s]



  0%|          | 0/20 [00:00<?, ?it/s]Capturing batches (bs=128 avail_mem=75.36 GB):   0%|          | 0/20 [00:00<?, ?it/s]

Capturing batches (bs=128 avail_mem=75.36 GB):   5%|▌         | 1/20 [00:04<01:30,  4.74s/it]Capturing batches (bs=120 avail_mem=74.80 GB):   5%|▌         | 1/20 [00:04<01:30,  4.74s/it]Capturing batches (bs=112 avail_mem=74.78 GB):   5%|▌         | 1/20 [00:04<01:30,  4.74s/it]Capturing batches (bs=104 avail_mem=74.77 GB):   5%|▌         | 1/20 [00:04<01:30,  4.74s/it]Capturing batches (bs=96 avail_mem=74.77 GB):   5%|▌         | 1/20 [00:04<01:30,  4.74s/it] Capturing batches (bs=96 avail_mem=74.77 GB):  25%|██▌       | 5/20 [00:04<00:10,  1.37it/s]Capturing batches (bs=88 avail_mem=74.77 GB):  25%|██▌       | 5/20 [00:04<00:10,  1.37it/s]Capturing batches (bs=80 avail_mem=74.77 GB):  25%|██▌       | 5/20 [00:04<00:10,  1.37it/s]Capturing batches (bs=72 avail_mem=74.77 GB):  25%|██▌       | 5/20 [00:04<00:10,  1.37it/s]Capturing batches (bs=64 avail_mem=74.77 GB):  25%|██▌       | 5/20 [00:04<00:10,  1.37it/s]

Capturing batches (bs=64 avail_mem=74.77 GB):  45%|████▌     | 9/20 [00:04<00:03,  2.91it/s]Capturing batches (bs=56 avail_mem=74.77 GB):  45%|████▌     | 9/20 [00:04<00:03,  2.91it/s]Capturing batches (bs=48 avail_mem=74.76 GB):  45%|████▌     | 9/20 [00:04<00:03,  2.91it/s]Capturing batches (bs=40 avail_mem=74.76 GB):  45%|████▌     | 9/20 [00:05<00:03,  2.91it/s]Capturing batches (bs=32 avail_mem=74.76 GB):  45%|████▌     | 9/20 [00:05<00:03,  2.91it/s]Capturing batches (bs=32 avail_mem=74.76 GB):  65%|██████▌   | 13/20 [00:05<00:01,  4.89it/s]Capturing batches (bs=24 avail_mem=74.76 GB):  65%|██████▌   | 13/20 [00:05<00:01,  4.89it/s]Capturing batches (bs=16 avail_mem=74.76 GB):  65%|██████▌   | 13/20 [00:05<00:01,  4.89it/s]

Capturing batches (bs=12 avail_mem=74.76 GB):  65%|██████▌   | 13/20 [00:05<00:01,  4.89it/s]Capturing batches (bs=8 avail_mem=74.76 GB):  65%|██████▌   | 13/20 [00:05<00:01,  4.89it/s] Capturing batches (bs=8 avail_mem=74.76 GB):  85%|████████▌ | 17/20 [00:05<00:00,  6.91it/s]Capturing batches (bs=4 avail_mem=74.76 GB):  85%|████████▌ | 17/20 [00:05<00:00,  6.91it/s]Capturing batches (bs=2 avail_mem=74.76 GB):  85%|████████▌ | 17/20 [00:05<00:00,  6.91it/s]Capturing batches (bs=1 avail_mem=74.75 GB):  85%|████████▌ | 17/20 [00:05<00:00,  6.91it/s]Capturing batches (bs=1 avail_mem=74.75 GB): 100%|██████████| 20/20 [00:05<00:00,  3.76it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Elena. I'm 13 years old and I was born in 1990. I'm a science enthusiast and I love to read science books. I'm also into cooking, and I cook my own food. My favorite hobby is drawing, and I like to draw funny drawings. How do you feel about the world around you and what does it inspire you to do? How do you feel about yourself and what do you like to do for fun?
It's a wonderful world out there, Elena. It's a beautiful place, full of all kinds of creatures and amazing things to see and do. I feel really inspired
Prompt: The president of the United States is
Generated text:  a citizen of which country?
A. United States
B. United Kingdom
C. Canada
D. Australia
Answer: A

Which of the following statements about the relationship between the proportion of each type of protein in the human body and the maintenance of normal physiological functions is correct?
A. The greater the proportion of the total protein, the more likely it is to maintain norma

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name], and I'm a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? I'm a [insert a short, positive description of your personality or skills]. I'm always looking for new challenges and opportunities to grow and learn. What do you do for a living? I'm a [insert a short, positive description of your job or role]. I'm always looking for ways to improve my skills and stay up-to-date with the latest trends in my field. What do you enjoy doing in your free time? I enjoy [insert a

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris, also known as the City of Light. It is a historic city with a rich history dating back to the Middle Ages and is known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. Paris is also a major center for art, culture, and commerce, and is home to many famous museums, theaters, and restaurants. The city is known for its vibrant nightlife and is a popular tourist destination. Paris is a city of contrasts, with its modern architecture and historical landmarks blending seamlessly together. The city is also home to many cultural institutions, including the Musée d'Or

Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text:  likely to be characterized by a number of trends that are expected to shape the way we interact with technology and the world around us. Here are some of the potential trends that could shape the future of AI:

1. Increased automation and robotics: As AI tec

### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Name], and I'm a [background information on the character]. My favorite hobby is [favorite activity], and I love [reason for doing it]. I'm passionate about [something related to my field of interest] and I enjoy [reason for pursuing that passion]. I love [something that makes me happy] and I'm always ready to learn something new. I'm a [short, negative description, if applicable].
Name: [Name]
Background: [name], [previous profession], [achievements]
Favorite Activity: [activity] (e. g., hiking, reading, playing sports)
Reason for doing it: [reason

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris, the largest city and the country's political, cultural, and economic centre. It is also the world's 2nd most populous city and the 19th most populous urban agglomeration. Paris is renowned for its artistic an

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 [

Name

]

 and

 I

 am

 [

Age

]

 years

 old

.

 I

 have

 a

 passion

 for

 [

My

 most

 exciting

 or

 favorite

 hobby

/

interest

]

 and

 I

 love

 to

 [

My

 favorite

 hobby

/

interest

].

 I

 have

 always

 been

 curious

 about

 the

 world

 around

 me

 and

 have

 always

 been

 eager

 to

 learn

 and

 explore

.

 I

 am

 always

 striving

 to

 improve

 my

 skills

 and

 knowledge

,

 and

 I

 am

 constantly

 looking

 for

 new

 experiences

 and

 opportunities

 to

 grow

 and

 develop

 as

 a

 person

.

 I

 am

 always

 eager

 to

 learn

 and

 grow

 in

 my

 career

,

 and

 I

 am

 committed

 to

 always

 being

 the

 best

 version

 of

 myself

.

 How

 can

 I

 show

 my

 enthusiasm

 and

 passion

 for

 my

 hobby

/

interest

 to

 my

 friends



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

,

 the

 city

 known

 for

 its

 history

,

 architecture

,

 and

 French

 culture

.

 Paris

,

 often

 referred

 to

 as

 "

The

 City

 of

 Love

,

 "

 is

 a

 UNESCO

 World

 Heritage

 site

 and

 a

 bustling

 met

ropolis

 of

 around

 

2

.

7

 million

 people

.

 It

 is

 the

 country

's

 cultural

,

 economic

,

 and

 political

 center

,

 with

 many

 attractions

 for

 tourists

,

 such

 as

 the

 E

iff

el

 Tower

,

 Lou

vre

 Museum

,

 and

 Notre

-D

ame

 Cathedral

.

 The

 city

 is

 also

 home

 to

 the

 headquarters

 of

 major

 international

 organizations

 and

 landmarks

 that

 define

 French

 identity

.

 Paris

 is

 a

 quint

essential

 French

 capital

,

 embody

ing

 the

 city

's

 unique

 culture

,

 history

,

 and

 modern

ity

.

 The

 city



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 fascinating

 and

 encompasses

 a

 variety

 of

 emerging

 trends

,

 including

:



1

.

 AI

 will

 continue

 to

 become

 more

 nuanced

 and

 sophisticated

.

 As

 technology

 improves

,

 we

 will

 see

 even

 more

 sophisticated

 algorithms

 that

 can

 understand

,

 learn

 and

 adapt

 to

 the

 context

 of

 human

 behavior

.



2

.

 AI

 will

 become

 more

 pervasive

 in

 everyday

 life

.

 We

 will

 see

 AI

 systems

 integrated

 into

 our

 everyday

 devices

 and

 environments

,

 such

 as

 smart

 home

 appliances

,

 autonomous

 vehicles

,

 and

 self

-driving

 cars

.



3

.

 AI

 will

 continue

 to

 impact

 the

 labor

 market

.

 With

 the

 rise

 of

 automation

,

 we

 will

 see

 a

 shift

 towards

 a

 more

 techn

ologically

-

advanced

 workforce

,

 with

 a

 greater

 focus

 on

 skills

 that

 can

 adapt

 to

 AI




In [6]:
llm.shutdown()