# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.44it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.43it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Claire, and I’m a passionate and creative graphic designer from the UK. I am currently working as a freelancer with a focus on graphic design and web design for small to medium-sized businesses. My goal is to create designs that are both creative and well thought out. I have over 6 years of experience in graphic design and am skilled in Adobe Illustrator, InDesign, and Photoshop. I also have a wide knowledge of web design and digital marketing, which I use to help small businesses grow their online presence and improve their SEO.
I have experience working with clients such as Facebook, Etsy, and Spotify. I am open to working with a
Prompt: The president of the United States is
Generated text:  trying to decide what movie to watch on release day. The movie is either a musical or a science fiction film. He has a preference for movies that are released every 2 weeks, specifically focusing on the 15th, 25th, 35th, and 45th release days. Given this

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name], and I'm a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? I'm a [age] year old, [gender] and I have [number] years of experience in [industry]. I'm a [job title] at [company name], and I enjoy [job title]ing. I'm always looking for new challenges and opportunities to grow and learn. What's your favorite hobby or activity? I love [hobby or activity], and I find it really helps me relax and recharge. What's your favorite book

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris, known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and Louvre Museum. It is also home to the French Parliament, the French Academy of Sciences, and the French National Library. Paris is a cultural and economic center with a rich history dating back to the Roman Empire and the French Revolution. It is also known for its fashion industry, with Paris Fashion Week being one of the largest in the world. The city is also home to the French Riviera, a popular tourist destination known for its beaches, restaurants, and nightlife. Paris is a major transportation hub, with the Eiffel Tower

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by several key trends:

1. Increased integration with human intelligence: AI is likely to become more integrated with human intelligence, allowing machines to learn and adapt to human behavior and preferences. This could lead to more personalized and efficient AI systems.

2. Enhanced ethical considerations: As AI becomes more integrated with human intelligence, there will be increased scrutiny of its ethical implications. This could lead to more stringent regulations and guidelines for AI development and deployment.

3. Greater reliance on machine learning: Machine learning is likely to become more prevalent in AI, allowing machines to learn and adapt to new situations and data. This could lead to more efficient



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Your Name], and I am a [Your Profession] who is passionate about [Your Professional Interest/Interest/Challenge]. I have always been fascinated by [Your Field/Subject/Challenge], and I'm determined to [Your Goal/Challenge]. I enjoy [Your Passion/Interest/Challenge] and I am always looking for new ways to [Your Skill/Outcome/Challenge]. What excites you about your career? What motivates you to keep going? What do you hope to achieve in the coming years? What are your career goals, and where are you headed? What are you looking forward to learning from and growing with in

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris, known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, Louvre Museum, and Notre-Dame du Montmartre. It is also known for its rich cultural heritage and its role as

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 [

Name

].

 I

’m

 a

 [

job

 title

 or

 hobby

].

 I

 like

 [

what

 makes

 me

 happy

]

 and

 am

 always

 [

something

 interesting

 or

 unique

 about

 me

].

 I

 enjoy

 [

occupation

 with

 related

 hobbies

].

 I

’m

 [

how

 I

 see

 myself

].

 If

 you

 asked

 me

 what

 I

 do

,

 I

’d

 say

 I

’m

 [

what

 I

’m

 most

 known

 for

].

 And

 if

 you

 asked

 me

 what

 I

 believe

 in

,

 I

’d

 say

 I

 believe

 in

 [

my

 core

 values

].

 I

’m

 [

how

 I

 plan

 to

 make

 my

 life

 better

].

 I

’m

 here

 for

 you

.

 [

Thank

 you

].

 My

 profile

 is

 complete

.

 I

’m

 ready

 to

 share

 more

 about

 my

 life

.

 [

Start



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 It

 is

 known

 for

 its

 iconic

 landmarks

 such

 as

 Notre

-D

ame

 Cathedral

 and

 the

 E

iff

el

 Tower

,

 as

 well

 as

 its

 rich

 history

 and

 cultural

 attractions

.

 Paris

 is

 a

 popular

 tourist

 destination

 and

 a

 significant

 economic

 and

 political

 hub

 of

 France

.

 It

 is

 home

 to

 a

 diverse

 range

 of

 languages

 and

 cultures

 and

 is

 home

 to

 countless

 museums

,

 restaurants

,

 and

 shops

.

 The

 city

 has

 a

 vibrant

 nightlife

 and

 a

 strong

 sense

 of

 French

 culture

 and

 tradition

.

 Paris

 is

 a

 cultural

 center

 that

 has

 a

 long

 history

 of

 artistic

,

 architectural

,

 and

 literary

 excellence

,

 and

 it

 continues

 to

 be

 a

 leading

 cultural

 and

 tourism

 destination

 today

.



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 undoubtedly

 going

 to

 be

 heavily

 influenced

 by

 the

 rapid

 development

 of

 new

 technologies

 and

 advances

 in

 our

 understanding

 of

 the

 world

 around

 us

.

 However

,

 there

 are

 several

 potential

 trends

 that

 could

 shape

 the

 future

 of

 AI

:



1

.

 Increased

 Integration

 with

 Human

 Intelligence

:

 As

 AI

 becomes

 more

 capable

,

 it

 will

 likely

 be

 more

 integrated

 with

 human

 intelligence

,

 leading

 to

 more

 complex

 and

 nuanced

 AI

 systems

.

 This

 could

 lead

 to

 greater

 collaboration

 between

 humans

 and

 machines

 in

 areas

 like

 healthcare

,

 security

,

 and

 transportation

.



2

.

 Enhanced

 Aut

onomy

:

 AI

 systems

 are

 likely

 to

 become

 more

 capable

 of

 making

 decisions

 and

 actions

 on

 their

 own

,

 with

 less

 human

 intervention

.

 This

 could

 lead

 to

 more

 autonomous

 vehicles

,




In [6]:
llm.shutdown()