# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.58it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.58it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Alan and I am a high school student from New York City. I have been asked to write a short summary on how to improve personal hygiene. Please create a brief summary of your ideas on how to improve personal hygiene. Your summary should be no more than 30 words. Use some bullet points to organize your ideas. Use the following items for your list: towel, toothbrush, soap, shower, washcloth, air freshener. Write your answer concisely. I. Shower daily to remove dirt, sweat, and oils from skin. II. Use toothbrush daily to clean teeth and gums. III. Rinse with
Prompt: The president of the United States is
Generated text:  not a member of the Federalist Party, but it is not difficult to find the president of the United States who is a member of that party. Here are some examples:

(a) The next president of the United States has been a member of the Democratic Party since his election to the presidency on January 20, 1789, and has been a member of the 

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name], and I'm a [Age] year old [Occupation]. I'm currently [Current Location]. I'm a [Favorite Hobby] enthusiast, and I love [Favorite Food/Drink]. I'm also a [Favorite Book/Artist/Artist] lover, and I enjoy [Favorite Activity]. I'm a [Favorite Movie/TV Show/Book/Artist] fan, and I love [Favorite Quote/Adjective/Word/Phrase]. I'm a [Favorite Sport/Activity/Travel/Book/Artist/Artist] enthusiast, and I love [Favorite Hobby/Activity/Quote/Adjective/Word/Phrase

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris, also known as the City of Light. It is a historic city with a rich history and a vibrant culture, known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. Paris is also home to many famous museums, including the Louvre and the Musée d'Orsay, as well as the Notre-Dame Cathedral and the Palace of Versailles. The city is also known for its cuisine, including its famous French cuisine, and its annual festivals and events. Paris is a city of contrasts, with its modern architecture and high-tech industries, as well as its traditional

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by several key trends:

1. Increased integration with human intelligence: AI is likely to become more integrated with human intelligence, allowing machines to learn from and adapt to human behavior and decision-making processes. This could lead to more sophisticated and adaptive AI systems that can learn from experience and make better decisions.

2. Enhanced privacy and security: As AI systems become more sophisticated, there will be an increased need for privacy and security measures to protect user data. This could lead to the development of new technologies and approaches to data privacy and security, such as blockchain-based privacy-preserving AI.

3. Greater focus on ethical AI: As



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Your Name], and I'm a [insert age range or profession] who is passionate about [insert something you enjoy, such as music, cooking, or reading]. I've always been fascinated by how different people's perspectives and experiences shape the way they see the world. I hope to use my background in [insert something about your field], and my personal experiences, to help others achieve their goals and make positive change in the world. What's your favorite book, movie, or podcast to listen to, and why?
[Your Name]: Introducing myself, [Your Name]. As a [insert age range or profession], I'm

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris, which is known for its iconic landmarks such as the Eiffel Tower, Notre Dame Cathedral, and the Louvre Museum. It is also home to the French Parliament building and is known 

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 [

Your

 Name

].

 I

'm

 a

 [

Your

 Profession

]

 with

 [

Your

 Degree

]

 in

 [

Your

 Field

 of

 Study

].

 I

'm

 a

 [

Your

 Inter

ests

/

Background

].

 I

'm

 currently

 in

 [

Your

 Current

 Position

/

Status

].

 I

 enjoy

 [

Your

 Passion

/

Interest

].

 I

'm

 always

 looking

 for

 ways

 to

 [

Your

 Goal

/

Challenge

].

 I

'm

 constantly

 learning

 and

 growing

 as

 a

 professional

 and

 personal

.

 How

 can

 you

 best

 describe

 your

 character

?



[

Your

 Name

]

 is

 a

 professional

 and

 personal

 journey

 enthusiast

 who

 is

 always

 seeking

 new

 experiences

 and

 learning

 opportunities

.

 Whether

 it

's

 through

 travel

,

 independent

 reading

,

 or

 pursuing

 a

 new

 hobby

,

 [

Your

 Name

]

 is

 always

 looking



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

,

 also

 known

 as

 "

la

 Ville

",

 which

 means

 "

City

"

 in

 French

.

 It

 is

 the

 largest

 city

 in

 Europe

 and

 the

 third

-largest

 city

 in

 the

 world

.

 Paris

 is

 home

 to

 many

 landmarks

,

 including

 the

 E

iff

el

 Tower

,

 Notre

 Dame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 The

 city

 is

 also

 known

 for

 its

 culture

,

 art

,

 and

 cuisine

.

 Paris

 has

 a

 rich

 history

 dating

 back

 over

 

2

,

 

0

0

0

 years

 and

 is

 a

 popular

 tourist

 destination

 for

 visitors

 from

 around

 the

 world

.

 Paris

 is

 often

 considered

 a

 cultural

 and

 intellectual

 hub

 of

 Europe

 and

 is

 home

 to

 many

 famous

 artists

,

 writers

,

 and

 philosophers

.

 It

 is

 also

 known



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 exciting

 and

 there

 are

 several

 possible

 trends

 that

 are

 likely

 to

 shape

 the

 technology

 and

 its

 applications

 in

 the

 years

 to

 come

.

 Here

 are

 some

 of

 the

 most

 likely

 trends

:



1

.

 Increased

 Personal

ization

:

 AI

 will

 continue

 to

 become

 more

 personalized

.

 As

 AI

 algorithms

 learn

 more

 about

 individual

 users

'

 preferences

 and

 behaviors

,

 they

 will

 be

 able

 to

 provide

 more

 tailored

 recommendations

 and

 solutions

.



2

.

 Autonomous

 Vehicles

:

 Autonomous

 vehicles

 are

 expected

 to

 become

 more

 prevalent

 in

 the

 future

.

 AI

-powered

 autonomous

 vehicles

 could

 significantly

 reduce

 traffic

 accidents

,

 improve

 safety

,

 and

 make

 transportation

 more

 efficient

.



3

.

 Smart

 Cities

:

 AI

 will

 be

 increasingly

 integrated

 into

 smart

 cities

 to

 provide

 better

 services

 and

 reduce

 waste

.




In [6]:
llm.shutdown()