# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.30it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.29it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Paul, and I'm an amateur photographer. I love taking photos of birds and animals, and I've taken many portraits of birds that have captured their attention. It is important to note that I'm not a professional photographer, and I am not qualified to make any claims about the accuracy of my work. I am just taking photos that I like and that I think are beautiful. Please feel free to ask me any questions you have about photography. I'm here to help and to learn from you. I look forward to talking to you soon! Have a great day!
How can I ensure that my photography style is professional and approachable for
Prompt: The president of the United States is
Generated text:  very busy every day. He sometimes has to tell people a lot of news. Sometimes the news is important, but sometimes it isn't. So it's very important for the president to be sure that he tells all the important news in a way that will be understood by most people. In the world of the p

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name], and I'm a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? [Name] is a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? [Name] is a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? [Name] is a [job title] at [company name]. I'm excited to meet you and learn more about

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris, known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. It is also a cultural and economic hub, with a rich history dating back to the Roman Empire and a modern city that has undergone significant development over the centuries. Paris is a popular tourist destination and a major center for French culture and politics. It is also home to many famous French artists, writers, and musicians. The city is known for its diverse cuisine, including French cuisine, as well as its traditional French culture and traditions. Paris is a city of contrasts, with its modern architecture and high-tech industries,

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by rapid advancements in areas such as machine learning, natural language processing, and computer vision. These technologies will continue to improve, leading to more sophisticated and accurate AI systems that can perform a wide range of tasks with increasing accuracy and efficiency. Some possible future trends in AI include:

1. Increased integration with human intelligence: As AI becomes more sophisticated, it will become more integrated with human intelligence, allowing it to perform tasks that are difficult or impossible for humans to do. This could lead to more efficient and effective use of AI in various fields, such as healthcare, finance, and transportation.

2. Greater emphasis on ethical considerations



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Sarah and I'm a computer programmer. I've always been passionate about technology and programming, and I've been coding for as long as I can remember. I enjoy creating new features and improving existing ones, and I'm always striving to stay up-to-date with the latest programming languages and trends. I'm also interested in ethical and legal issues surrounding technology, and I strive to be an advocate for transparency and accountability in the development of software. I'm excited to learn more about the future of technology and how it will impact us all. Thank you for the opportunity to introduce myself. That sounds like a great introduction. Is there anything else you

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. 

Facts about Paris:

- It is the most populous city in Europe and the second most pop

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 John

.

 I

'm

 a

 writer

 and

 illustrator

 who

 has

 always

 been

 fascinated

 by

 the

 human

 condition

.

 I

 enjoy

 exploring

 the

 depths

 of

 human

 emotions

 and

 exploring

 new

 ways

 to

 communicate

 them

 through

 art

.

 My

 work

 often

 centers

 around

 the

 theme

 of

 love

,

 whether

 it

's

 a

 deep

,

 intense

 bond

,

 a

 fleeting

 moment

,

 or

 a

 complicated

 relationship

.

 I

 am

 always

 looking

 for

 new

 ways

 to

 bring

 people

 together

 and

 create

 emotional

 connections

.

 As

 a

 writer

 and

 illustrator

,

 I

'm

 always

 looking

 for

 opportunities to

 tell

 stories

 and

 inspire

 others

 to connect

 with the

 world

 around

 them

.

 Thank

 you

 for

 taking

 the

 time

 to

 learn

 more

 about

 me

.

 What

 is

 your

 favorite

 type

 of

 art

 to

 create

,



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

,

 located

 in

 the

 northern

 part

 of

 the

 country

.

 It

 is

 the

 largest

 city

 in

 the

 European

 Union

 and

 one

 of

 the

 world

's

 most

 populous

 cities

.

 Paris

 is

 home

 to

 many

 famous

 landmarks

 and

 attractions

,

 including

 the

 E

iff

el

 Tower

,

 Lou

vre

 Museum

,

 Notre

-D

ame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 The

 city

 is

 known

 for

 its

 rich

 history

,

 beautiful

 architecture

,

 and

 vibrant

 culture

,

 and

 is

 a

 major

 tourist

 destination

 in

 France

.

 It

 is

 also

 home

 to

 several

 international

 institutions

 and

 organizations

,

 including

 the

 European

 Parliament

 and

 the

 European

 Central

 Bank

.

 Paris

 is

 the

 capital

 of

 France

 and

 one

 of

 the

 world

's

 most

 significant

 cities

.

 It

 is

 often

 referred



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 highly

 dynamic

,

 with

 various

 trends

 and

 developments

 expected

 to

 shape

 the

 landscape

 of

 the

 technology

.

 Here

 are

 some

 of

 the

 potential

 trends

 in

 AI

 that

 are

 likely

 to

 shape

 the

 future

:



1

.

 Increased

 precision

 and

 efficiency

:

 As

 AI

 becomes

 more

 sophisticated

,

 its

 ability

 to

 perform

 tasks

 with

 greater

 accuracy

 and

 efficiency

 is

 likely

 to

 increase

.

 This

 could

 lead

 to

 improvements

 in

 manufacturing

 processes

,

 healthcare

,

 and

 other

 industries

.



2

.

 Autonomous

 vehicles

:

 AI

 is

 already

 making

 significant

 progress

 in

 autonomous

 vehicles

,

 with

 companies

 like

 Tesla

 and

 Way

mo

 focused

 on

 developing

 self

-driving

 cars

.

 As

 this

 technology

 advances

,

 it

 is

 likely

 that

 more

 companies

 will

 enter

 the

 market

,

 leading

 to

 an

 increase

 in

 the




In [6]:
llm.shutdown()