# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]


Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.68it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.67it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Nathan, and I'm a business student in China. My Chinese name is Guo. My grandma has been very healthy. She is a vegetarian and always eats rice for breakfast and rice for lunch. Her son is a doctor. This is a very special day. She's going on a trip to Tibet with her son in 10 days. She's going on a trip to Tibet with her son in 10 days. The trip will last for 10 days. So she'll be going on a trip to Tibet for 10 days. She's going on a trip to Tibet for 10 days. The
Prompt: The president of the United States is
Generated text:  a very important person. He is like the boss of the whole country. He has a lot of important jobs to do. He also has a lot of friends in the country. He likes his job very much. The president of the United States is like a king. He can't change the country, but he can change some laws. He can also make changes in the country. His first job is to help the president of the United States. He has to work for the president of

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name], and I'm a [job title] at [company name]. I'm a [insert your profession or role here] with [insert your years of experience here]. I'm passionate about [insert something that reflects your personality or interests here]. I enjoy [insert something that reflects your interests or hobbies here]. I'm always looking for new challenges and opportunities to grow and learn. I'm excited to meet you and learn more about you. [Name] [Job Title] [Company Name] [Company Address] [City, State ZIP Code] [Phone Number] [Email Address] [LinkedIn Profile] [Twitter Profile]

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris, known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. It is also home to the French Parliament, the French Academy of Sciences, and the French National Library. Paris is a bustling metropolis with a rich cultural heritage and is a major tourist destination. It is also home to many international organizations and institutions, including the European Parliament and the United Nations. The city is known for its fashion industry, art scene, and food culture, and is a popular destination for tourists and locals alike. Paris is a city of contrasts, with its modern architecture and historical landmarks blending seamlessly

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by a number of trends that are expected to shape the way we interact with technology and the world around us. Here are some potential trends that could be expected in the future:

1. Increased automation and artificial intelligence: As AI technology continues to advance, we can expect to see more automation and artificial intelligence in our daily lives. This could include things like self-driving cars, robots in manufacturing, and even virtual assistants that can assist with tasks like answering questions and providing information.

2. Improved privacy and security: As AI becomes more advanced, we can expect to see more privacy and security concerns. This could include things like the



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  ____. I'm an AI assistant who was created by ____. What can you tell me about yourself? (Feel free to provide a brief biography or a summary of your personality and interests) If you have any questions or topics you'd like to discuss, feel free to ask! #Self-introduction #AI-assistant

I'm an artificial intelligence assistant created by Google. My mission is to assist people in answering their questions and providing information on various topics. I've been here for several years, and I've learned a lot along the way, from both the humans who interact with me and the users who use me. What would you

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris.

What is the answer? Paris is the capital of which country? France.Tooling provides a system of mechanical devices for machining workpieces, whether they are 

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 [

insert

 name

],

 and

 I

'm

 a

 [

insert

 profession

 or

 occupation

]

 with

 [

insert

 number

 of

 years

 of

 experience

].

 I

've

 always

 had

 a

 passion

 for

 [

insert

 interest

 or

 hobby

]

 and

 I

'm

 always

 looking

 to

 learn

 more

 about

 it

.

 Whether

 it

's

 a

 new

 way

 of

 cooking

 or

 a

 new

 language

,

 I

'm

 always

 eager

 to

 explore

 and

 discover

 more

.

 I

'm

 also

 a

 bit

 of

 a

 natural

 communicator

,

 as

 I

 love

 to

 listen

 and

 share

 my

 knowledge

 with

 others

.

 I

 enjoy

 staying

 up

 late

 reading

 books

 and

 watching

 movies

,

 and

 I

'm

 always

 looking

 for

 ways

 to

 stay

 updated

 on

 the

 latest

 trends

 and

 advancements

 in

 my

 field

.

 I

 thrive

 on

 being

 a



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 Its

 location

 is

 on

 the

 Se

ine

 river

 and

 has

 a

 rich

 history

,

 including

 the

 completion

 of

 the

 E

iff

el

 Tower

 in

 

1

8

8

9

,

 which

 was

 the

 tallest

 building

 in

 the

 world

 at

 the

 time

.

 The

 city

 is

 known

 for

 its

 many

 landmarks

,

 including

 the

 Notre

-D

ame

 Cathedral

,

 the

 Lou

vre

 Museum

,

 and

 the

 Opera

 House

.

 Paris

 is

 also

 famous

 for

 its

 fashion

 industry

,

 with

 iconic

 fashion

 brands

 such

 as

 Chanel

 and

 D

ior

.

 The

 city

 is

 home

 to

 a

 diverse

 population

 with

 a

 mixture

 of

 French

,

 German

,

 and

 Italian

 cultures

.

 The

 city

 is

 also

 known

 for

 its

 cuisine

,

 with

 popular

 dishes

 such

 as

 be

ign

ets

 and



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 likely

 to

 involve

 several

 key

 trends

:



1

.

 Increased

 automation

:

 AI

 will

 continue

 to

 automate

 repetitive

 tasks

 and

 improve

 efficiency

 in

 manufacturing

,

 transportation

,

 and

 many

 other

 sectors

.

 Automation

 will

 also

 drive

 innovation

 in

 areas

 such

 as

 healthcare

,

 finance

,

 and

 education

.



2

.

 Improved

 privacy

 and

 security

:

 AI

 will

 continue

 to

 improve

 in

 terms

 of

 privacy

 and

 security

,

 but

 there

 will

 still

 be

 a

 need

 for

 oversight

 and

 regulation

 to

 ensure

 that

 AI

 is

 used

 responsibly

.



3

.

 Autonomous

 vehicles

:

 Autonomous

 vehicles

 will

 continue

 to

 develop

,

 with

 the

 goal

 of

 reducing

 accidents

 and

 improving

 traffic

 flow

.

 AI

 will

 also

 play

 a

 role

 in

 making

 self

-driving

 cars

 more

 accessible

 and

 affordable

.



4

.

 AI




In [6]:
llm.shutdown()