# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  6.24it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  6.24it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Sam, and I am 18 years old. I want to be a teacher. What are some things that you think should be included in the school curriculum?

As an AI language model, I don't have personal experiences or emotions, but I can suggest some things that could be included in a school curriculum for teachers considering pursuing a career in education. Here are some factors that should be considered:

1. Teaching skills: Teachers should be taught the basic skills they need to teach in a classroom, such as teaching literacy and numeracy, teaching science, teaching language arts, and teaching social studies.

2. Critical thinking and problem-solving:
Prompt: The president of the United States is
Generated text:  trying to reduce the number of cars on the roads. The president's team is studying the last 5 years of data on the number of cars on the road in the United States. They found that the number of cars on the road decreased by a certain percentage each yea

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name], and I'm a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? [Name] is a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? [Name] is a [job title] at [company name]. I'm excited to meet you and learn more about you. What can you tell me about yourself? [Name] is a [job title] at [company name]. I'm excited to meet you and learn more about

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris. 

The statement is concise and accurately describes the capital city of France. It is a factual statement that provides a clear and specific answer to the question asked. The statement is concise because it only contains the essential information needed to answer the question, and it is accurate because it accurately describes the capital city of France. The statement is concise because it is a single sentence that does not require any additional information to be understood. It is accurate because it accurately describes the capital city of France, which is Paris. The statement is concise because it is a single sentence that does not require any additional information to be understood. It is accurate because it

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by rapid advancements in several key areas, including:

1. Increased automation: AI is expected to become more prevalent in various industries, including manufacturing, healthcare, and transportation. Automation will likely lead to increased efficiency and productivity, but it will also create new challenges, such as job displacement and the need for human oversight.

2. Improved natural language processing: AI will continue to improve its ability to understand and interpret human language, leading to more natural and intuitive interactions with machines. This will enable more sophisticated forms of communication and collaboration.

3. Enhanced privacy and security: As AI becomes more integrated into our daily lives, there will



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Your Name], a [Job Title] with over [Number of Years] years of experience in [Specific field of work]. I'm passionate about [mention a project or accomplishment that reflects your personality]. I'm always looking for challenges, different perspectives, and fresh ideas, and I'm eager to learn and grow in all areas of my career. Thank you for considering me for this role.
Hello, my name is [Your Name], a [Job Title] with over [Number of Years] years of experience in [Specific field of work]. I'm passionate about [mention a project or accomplishment that reflects your personality]. I'm always

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris, known for its iconic Eiffel Tower, romantic cafes, and the annual烟花（焰火）display. Its history, which dates back to ancient times, is rich with legends and mysteries. The

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 [

Name

],

 and

 I

'm

 a

 [

Type

 of

 professional

],

 [

Industry

].

 I

've

 been

 working

 for

 [

Company

]

 for

 [

Number

 of

 Years

]

 years

,

 and

 I

've

 gained

 a

 lot

 of

 experience

 in

 [

Area

 of

 expertise

].

 Currently

,

 I

'm

 [

State

 of

 my

 job

],

 and

 I

'm

 excited

 to

 bring

 my

 [

Skill

 or

 Expert

ise

]

 to

 [

Company

].

 Looking

 forward

 to

 the

 opportunity

 to

 learn

 and

 grow

 with

 you

!

 


(

Repeat

 the

 text

 a

 few

 more

 times

 to

 emphasize

 how

 much

 I

 appreciate

 my

 current

 role

 and

 the

 value

 it

 brings

 to

 the

 company

)

 


I

 look

 forward

 to

 being

 part

 of

 your

 team

 and

 contributing

 to

 the

 success

 of

 [



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 It

 is

 the

 largest

 and

 most

 populous

 city

 in

 the

 country

,

 located

 on

 the

 Se

ine

 River

.

 The

 city

 is

 known

 for

 its

 historical

 landmarks

,

 including

 the

 Lou

vre

 Museum

,

 Notre

-D

ame

 Cathedral

,

 and

 the

 Arc

 de

 Tri

omp

he

.

 Paris

 is

 also

 famous

 for

 its

 fashion

 industry

,

 food

 culture

,

 and

 annual

 festivals

 such

 as

 the

 E

iff

el

 Tower

 Parade

.

 It

 is

 a

 cultural

 and

 economic

 hub

 that

 plays

 a

 crucial

 role

 in

 French

 society

 and

 foreign

 policy

.

 As

 of

 

2

0

2

1

,

 the

 population

 of

 Paris

 is

 approximately

 

2

.

3

 million

.

 The

 city

 is

 also

 home

 to

 several

 other

 important

 international

 institutions

 such

 as

 the

 International

 Monetary



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 highly

 uncertain

,

 and

 there

 are

 many

 potential

 trends

 and

 innovations

 that

 could

 shape

 the

 technology

's

 direction

.

 Here

 are

 some

 possible

 future

 trends

 in

 artificial

 intelligence

:



1

.

 Increased

 use

 of

 AI

 in

 healthcare

:

 AI

 can

 help

 healthcare

 providers

 diagnose

 diseases

,

 recommend

 treatments

,

 and

 even

 personalize

 medication

 recommendations

.

 This

 could

 lead

 to

 more

 accurate

 diagnoses

 and

 improved

 treatment

 outcomes

.



2

.

 Greater

 integration

 of

 AI

 into

 education

:

 AI

 can

 help

 educators

 personalize

 learning

 experiences

,

 track

 student

 progress

,

 and

 provide

 personalized

 feedback

.

 This

 could

 revolution

ize

 the

 way

 we

 learn

 and

教

人

。



3

.

 AI

 in

 the

 financial

 industry

:

 AI

 can

 help

 with

 fraud

 detection

,

 risk

 management

,

 and

 decision

-making

.

 This




In [6]:
llm.shutdown()