# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.44it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.43it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Maria and I am a high school student. I am from Argentina. I want to be a doctor and help the world. I am very thankful to you for this place. I hope to be a doctor because I am very kind and caring. The most important thing for me is being able to help other people. Doctors can help people who are sick. They can tell them to have a healthy life. They can give them medicine and they can make people feel better. They can also be there when someone is in trouble. Doctors can also help people who are very sick and many people who are very old. They can help them to live
Prompt: The president of the United States is
Generated text:  a (）office.
A. ceremonial
B. ceremonial
C. ceremonial
D. ceremonial
答案:
D

根据《中华人民共和国行政强制法》的规定，人民法院对行政机关强制执行的申请进行书面审查，认为行政机关的强制执行申请符合下列情形之一的，应当即时作出解除强制执行决定：
A. 强制执行标的灭失的
B. 当事人履行行政决定确有困难或者暂无履行能力的
C. 行政机关申请执行前依法进行了催告，且当事人未履行的
D. 法律、法规、规章规定应当解除强制
Prompt: The capital of France is
Generated text:  Paris.
The capital of Swi

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name], and I'm a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? I'm a [job title] at [company name], and I'm passionate about [job title] and [job title]. I enjoy [job title] because [reason for interest]. I'm always looking for new challenges and opportunities to grow and learn. What's your favorite hobby or activity? I love [hobby or activity]. I'm always looking for new ways to challenge myself and expand my knowledge. What's your favorite book or movie? I

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris. It is the largest city in France and the second-largest city in the European Union. It is known for its rich history, beautiful architecture, and vibrant culture. Paris is home to many famous landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. It is also a major center for business, finance, and tourism. The city is known for its annual Eiffel Tower Festival and its annual fashion week. Paris is a popular tourist destination and a cultural hub for France and the world. It is home to many museums, theaters, and art galleries. The city is also known for its cuisine

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by several key trends:

1. Increased integration with human intelligence: AI is likely to become more integrated with human intelligence, allowing machines to learn from and adapt to human behavior and experiences. This could lead to more sophisticated and personalized AI systems that can better understand and respond to human needs.

2. Enhanced ethical considerations: As AI becomes more integrated with human intelligence, there will be increased scrutiny of its ethical implications. This could lead to more stringent regulations and guidelines for AI development and deployment.

3. Greater reliance on AI for decision-making: AI is likely to become more integrated with human decision-making processes, allowing machines to make



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Your Name] and I am a [Title] at [Company Name]. As a [Title] at [Company Name], I am dedicated to [Your Mission/Goal/Positive Impact]. I have a passion for [Your Passion/Interest], and I am committed to [Your Core Value/Value]. I have a strong work ethic and strive to [Your Goal/Innovation/Continuous Learning]. I am a [Your Character/Personality], and I believe in [Your Company's Values and Mission]. I am a [Your Personal Factor] and I am always [Your Favorite Quote, Enthusiasm, etc.]. I

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris, an ancient city with a rich history that has been influenced by various cultures, including that of Greece and Rome. It has a population of over 1.2 million people and is the largest city in Europe. The city has a rich history dating back to the Roman period and has bee

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 [

Name

]

 and

 I

 am

 a

 [

age

]

 year

 old

 [

gender

]

 [

background

].

 I

 have

 a

 passion

 for

 [

field

]

 and

 I

 am

 a

 [

career

 level

]

 [

professional

 title

]

 in

 this

 field

.

 I

 am

 the

 [

career

 level

]

 [

professional

 title

]

 for

 [

company

 name

],

 and

 I

 have

 been

 working

 for

 [

company

 name

]

 for

 [

number

 of

 years

]

 years

.

 I

 am

 a

 [

career

 level

]

 [

professional

 title

]

 for

 [

company

 name

],

 and

 I

 have

 been

 working

 for

 [

company

 name

]

 for

 [

number

 of

 years

]

 years

.

 I

 am

 passionate

 about

 [

field

],

 and

 I

 believe

 in

 [

reason

 why

 I

 love

 [

field



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

,

 known

 for

 its

 iconic

 landmarks

 such

 as

 the

 E

iff

el

 Tower

,

 Lou

vre

 Museum

,

 Notre

-D

ame

 Cathedral

,

 and

 the

 Mar

ais

 district

.

 It

 is

 also

 home

 to

 the

 French

 Parliament

 building

,

 the

 Lou

vre

 Museum

,

 and

 a

 rich

 cultural

 heritage

 spanning

 over

 

5

0

0

 years

 of

 French

 history

.

 Paris

 is

 a

 popular

 tourist

 destination

 and

 is

 known

 for

 its

 world

-f

amous

 fashion

,

 art

,

 and

 food

 scenes

.

 It

 is

 often

 referred

 to

 as

 the

 "

City

 of

 Light

"

 due

 to

 its

 vibrant

 nightlife

 and

 cosm

opolitan

 atmosphere

.

 The

 city

 has

 a

 diverse

 and

 multicultural

 population

,

 with

 many

 French

-speaking

 and

 Franc

ophone

 communities

.

 Paris

 is

 a

 unique

 and



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 likely

 to

 be

 characterized

 by

 a

 number

 of

 potential

 trends

 that

 could

 shape

 the

 way

 we

 use

 and

 interact

 with

 AI

 technology

.

 Here

 are

 some

 possible

 future

 trends

 in

 AI

:



1

.

 Increased

 Transparency

 and

 Explain

ability

:

 As

 AI

 systems

 become

 more

 advanced

,

 we

 may

 see

 a

 greater

 emphasis

 on

 transparency

 and

 explain

ability

.

 This

 means

 that

 developers

 will

 be

 required

 to

 provide

 more

 details

 about

 how

 their

 AI

 systems

 work

 and

 why

 they

 make

 decisions

.

 This

 will

 help

 to

 build

 trust

 with

 users

 and

 provide

 an

 additional

 layer

 of

 security

 and

 accountability

.



2

.

 Emer

gence

 of

 More

 Complex

 AI

:

 AI

 systems

 are

 likely

 to

 become

 more

 complex

,

 with

 the

 ability

 to

 process

 and

 analyze

 a

 wider

 range




In [6]:
llm.shutdown()