# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.01it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.00it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Tom and I am an experienced Java Developer who was asked to learn Java since a few years ago.
I want to start a business and need to decide on what kind of business to start. Since I have no experience in business, I would appreciate it if you could tell me the process to start a business, including the steps you would take to start a business and the expenses I would have to pay for the business.
Certainly! Starting a business can be both exciting and challenging, but with a solid plan and a bit of guidance, it can be done successfully. Here’s a step-by-step guide to help you get started on your business
Prompt: The president of the United States is
Generated text:  a very important person in the government of the United States. He or she is the leader of the country. He or she has the power to make important decisions. The president is not elected by the people. The president is not the only leader in the government of the United States. The

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name], and I'm a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? I'm a [job title] at [company name], and I'm excited to meet you and learn more about you. What can you tell me about yourself? I'm a [job title] at [company name], and I'm excited to meet you and learn more about you. What can you tell me about yourself? I'm a [job title] at [company name], and I'm excited to meet you and learn more about you. What

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris, known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and Louvre Museum. It is also home to the French Parliament and the French National Library. Paris is a bustling city with a rich history and culture, and is a popular tourist destination. It is the capital of France and the largest city in the European Union. 

Paris is known for its beautiful architecture, including the Louvre Museum, the Eiffel Tower, and the Notre-Dame Cathedral. The city is also famous for its fashion industry, with many famous designers and boutiques located in the city. Paris is a city of

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by rapid advancements in areas such as machine learning, natural language processing, and computer vision. Some possible future trends include:

1. Increased integration of AI into everyday life: AI is already being integrated into our daily lives, from voice assistants like Siri and Alexa to self-driving cars. As AI becomes more integrated into our daily lives, we can expect to see even more widespread adoption of AI in our daily routines.

2. AI becoming more autonomous: As AI technology continues to advance, we can expect to see more autonomous vehicles on the roads. This could lead to a significant reduction in traffic accidents and a decrease in the use



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Name], and I am a [Type] with [Number] years of experience in [Related field], specializing in [Primary skill or knowledge]. I am [Height], [Weight], and [Gender]. I enjoy [My hobby, interest, or passion], and I am [My level of engagement with that interest]. I am [My personality type or style]. I am [My goal or mission]. I am [My role or purpose in the world]. I am [My outlook on life and my values]. I am [My hope for the future]. I am [My future goals and aspirations]. I am [My personal values

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris, a historic city located in the southern part of the country. It is the largest and most populous city in France, with a population of around 2. 3 million people. Paris is known for its rich history, artistic and cultural institutions, and major landmarks such as 

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 John

,

 I

’m

 a

 modern

,

 dynamic

,

 and

 intelligent

 person

.

 I

'm

 passionate

 about

 exploring

 new

 experiences

 and

 learning

 new

 things

 every

 day

.

 I

'm

 not

 afraid

 to

 take

 risks

 and

 try

 new

 things

,

 and

 I

'm

 always

 open

 to

 new

 ideas

 and

 perspectives

.

 I

'm

 a

 loyal

 friend

 and

 co

-worker

,

 and

 I

 enjoy

 spending

 time

 with

 people

 of

 all

 ages

 and

 backgrounds

.

 I

 value

 honesty

,

 integrity

,

 and

 hard

 work

,

 and

 I

 strive

 to

 always

 do

 my

 best

 and

 be

 my

 best

 self

.

 I

'm

 confident

 and

 capable

,

 and

 I

 love

 the

 thrill

 of

 a

 good

 challenge

.

 I

 look

 forward

 to

 working

 with

 you

,

 and

 I

'm

 excited

 to

 learn

 more

 about



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 



France

's

 capital

,

 Paris

,

 is

 a

 sprawling

 city

 located

 in

 the

 south

 of

 the

 country

 on

 the

 right

 bank

 of

 the

 Se

ine

 river

.

 It

 is

 the

 largest

 city

 in

 Europe

,

 with

 an

 estimated

 population

 of

 

2

.

3

 million

 people

 as

 of

 

2

0

2

1

.

 



Paris

 is

 the

 second

 most

 populous

 city

 in

 the

 European

 Union

,

 following

 Brussels

.

 It

 has

 a

 rich

 history

,

 a

 vibrant

 culture

,

 and

 an

 array

 of

 cultural

 events

 and

 attractions

,

 including

 the

 Lou

vre

 Museum

 and

 the

 E

iff

el

 Tower

.

 



The

 city

 is

 known

 for

 its

 beautiful

 architecture

,

 iconic

 landmarks

,

 and

 lively

 street

 life

.

 It

 is

 also

 a

 hub

 for



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 looking

 more

 complex

,

 sophisticated

,

 and

 diverse

 than

 ever

 before

.

 Here

 are

 some

 possible

 future

 trends

 that

 we

 can

 expect

 to

 see

:



1

.

 Autonomous

 vehicles

:

 As

 the

 demand

 for

 transportation

 increases

,

 autonomous

 vehicles

 may

 become

 more

 prevalent

.

 These

 vehicles

 are

 equipped

 with

 sensors

 and

 cameras

 that

 allow

 them

 to

 navigate

 roads

 and

 navigate

 traffic

,

 making

 them

 safer

 and

 more

 efficient

 than

 traditional

 vehicles

.



2

.

 Facial

 recognition

 technology

:

 Facial

 recognition

 technology

 is

 rapidly

 becoming

 more

 sophisticated

,

 allowing

 for

 more

 accurate

 and

 faster

 identification

 of

 people

.

 As

 this

 technology

 becomes

 more

 widespread

,

 we

 can

 expect

 to

 see

 applications

 for

 it

 in

 areas

 such

 as

 personal

 identification

,

 security

,

 and

 fraud

 detection

.



3

.

 AI




In [6]:
llm.shutdown()