# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")





INFO 07-28 02:06:50 [__init__.py:247] No platform detected, vLLM is running on UnspecifiedPlatform


INFO 07-28 02:07:02 [__init__.py:247] No platform detected, vLLM is running on UnspecifiedPlatform


Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]


Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.62it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.62it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Jessica and I am currently studying at the University of Alberta. I'm going to share my experience and opinions on the topic of public speaking. Can you please share your thoughts on the topic? How do I get started with my first public speaking event? Absolutely, I'd be happy to share my thoughts on public speaking and how to get started with your first public speaking event! To get started, here are a few tips:

1. Choose a topic that you're passionate about and that others will be interested in. This can make your event more enjoyable and memorable.

2. Practice your speech several times to get comfortable with the content and delivery
Prompt: The president of the United States is
Generated text:  trying to decide how many military aircraft to purchase. The current budget is $100 billion, and each aircraft costs $20 billion. The president wants to buy as many aircraft as possible within the budget. However, due to geopolitical concerns, the 

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name], and I'm a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? I'm a [insert a short description of your job or profession]. I enjoy [insert a short description of your hobbies or interests]. What's your favorite hobby or activity? I love [insert a short description of your favorite activity]. What's your favorite book or movie? I love [insert a short description of your favorite book or movie]. What's your favorite color? I love [insert a short description of your favorite color]. What's your favorite food

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris, known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. It is also a cultural and historical center with a rich history dating back to the Middle Ages. Paris is a popular tourist destination and a major economic hub, with a diverse array of restaurants, cafes, and shops. The city is known for its fashion, art, and food scene, and is home to many famous museums and galleries. Paris is a vibrant and dynamic city with a strong sense of community and a strong sense of identity. It is a city of contrasts and beauty, with a unique blend of old

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by rapid advancements in areas such as machine learning, natural language processing, and computer vision. These technologies are expected to continue to improve and become more integrated into our daily lives, from self-driving cars to personalized healthcare and financial services. Additionally, AI is likely to continue to be used for ethical and social reasons, such as improving access to education and healthcare for marginalized communities. As AI continues to evolve, it is likely to have a significant impact on the way we live, work, and interact with each other. However, it is also important to consider the potential risks and challenges associated with AI, such as job displacement and



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Your Name], and I am a [Type of Career] with [Number of Years in that Career]. I’m passionate about [Your Passion for the Career], which has driven me to [How you've proven yourself in this field] and is the reason for my continued success. I love [Name of the Goal/Job/Professional Role], and I'm thrilled to be pursuing this journey. What can we expect from our conversation today? I'm excited to learn more about you and how I can help you achieve your goals. It's a pleasure to meet you and share my journey with you. What would you like to talk about

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris, a historic city with a rich cultural heritage and cosmopolitan atmosphere. It serves as the nation's political and economic center, and hosts numerous attractions and events throughout the year. The city is k

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 [

Name

].

 I

'm

 a

 [

job

 title

]

 with

 [

employer

's

 name

]

 working

 in

 [

position

].

 I

've

 always

 been

 passionate

 about

 [

what

 you

 can

 say

 about

 your

 main

 occupation

].

 I

've

 been

 learning

 new

 things

 and

 growing

 in

 my

 career

,

 and

 I

'm

 excited

 to

 continue

 doing

 what

 I

 love

.

 What

 kind

 of

 experiences

 do

 you

 have

 that

 make

 you

 unique

 and

 interesting

 to

 people

?

 That

's

 all

 there

 is

 to

 my

 story

.

 What

 other

 experiences

 would

 you

 like

 to

 share

?

 [

Tell

 your

 story

 in

 a

 few

 sentences

.

 Give

 an

 example

 of

 an

 experience

 that

 stands

 out

 and

 why

 it

 was

 meaningful

 to

 you

.

 Mention

 what

 you

 learned

 from

 it

.]

 How



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 It

 is

 the

 largest

 city

 in

 the

 country

,

 with

 a

 population

 of

 over

 

2

.

2

 million

 people

.

 Paris

 is

 known

 for

 its

 vibrant

 cultural

 scene

,

 rich

 history

,

 and

 stunning

 architecture

,

 as

 well

 as

 its

 beautiful

 canal

 system

 and

 world

-ren

owned

 museums

.

 The

 city

 is

 also

 home

 to

 many

 international

 organizations

,

 including

 the

 European

 Parliament

 and

 UNESCO

.

 Paris

 is

 a

 popular

 tourist

 destination

 and

 a

 major

 center

 of

 French

 culture

 and

 politics

.

 The

 city

 is

 known

 for

 its

 annual

 festivals

,

 including

 the

 E

iff

el

 Tower

 and

 Christmas

 markets

.



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 vast

,

 and

 there

 is

 no

 telling

 exactly

 where

 it

 will

 go

.

 However

,

 some

 possible

 trends

 that

 could

 occur

 include

:



1

.

 Increased

 AI

 development

 and

 deployment

:

 With

 the

 rising

 cost

 and

 development

 costs

 of

 AI

,

 there

 is

 a

 possibility

 that

 AI

 will

 be

 deployed

 more

 widely

 and

 more

 efficiently

.

 This

 could

 lead

 to

 the

 creation

 of

 more

 efficient

 and

 cost

-effective

 AI

 solutions

.



2

.

 Autonomous

 vehicles

:

 AI

 is

 already

 being

 used

 in

 autonomous

 vehicles

,

 but

 there

 is

 also

 potential

 for

 AI

 to

 improve

 their

 performance

 and

 safety

.

 Autonomous

 vehicles

 are

 expected

 to

 be

 more

 efficient

,

 safer

,

 and

 have

 a

 lower

 carbon

 footprint

 than

 traditional

 vehicles

.



3

.

 Medical

 advancements

:

 AI

 is

 being




In [6]:
llm.shutdown()