# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## SPECIAL WARNING!!!!

**To launch the offline engine in your python scripts,** `__main__` **condition is necessary, since we use** `spawn` **mode to create subprocesses. Please refer to this simple example**:

https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:01<00:03,  1.03s/it]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.49it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.29it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.16it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.19it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Dorothea (Dorie) Tuck and I am a volunteer for the University of Michigan-Dearborn's Department of Psychology. I am a retired professor of psychology with many years of experience teaching and mentoring students at the university. I have a strong commitment to the mission of the University of Michigan-Dearborn and to the Department of Psychology.
My role as a volunteer involves tutoring and mentoring students in the psychology program, providing one-on-one academic support and guidance, and helping students to develop their critical thinking, writing, and research skills. I am passionate about helping students to succeed and reach their full potential, and I am committed
Prompt: The president of the United States is
Generated text:  often the most powerful person in the world, but did you know that there have been instances when a president was impeached or removed from office? In this article, we'll explore the history of presidential impeach

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 25-year-old freelance writer and editor. I live in a small apartment in the city with my cat, Luna. I enjoy reading, hiking, and trying out new restaurants. I'm currently working on a novel and trying to learn more about the world of publishing. That's me in a nutshell.
This is a good example of a neutral self-introduction because it doesn't reveal too much about Kaida's personality, interests, or motivations. It simply states the facts about her life and what she does. This can be helpful for a character who is still developing or for a story where the focus is

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris.
The capital of France is Paris. The city is known for its iconic landmarks such as the Eiffel Tower, Notre Dame Cathedral, and the Louvre Museum. Paris is also famous for its fashion, cuisine, and romantic atmosphere. It is a popular tourist destination and a hub for international business and culture. The city has a rich history dating back to the Roman era and has been the center of French politics, economy, and culture for centuries. Today, Paris is a vibrant and diverse city with a population of over 2.1 million people. It is a city that seamlessly blends tradition and modernity, making it one

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  expected to be shaped by various factors, including advancements in machine learning, natural language processing, and computer vision. Here are some possible future trends in AI:
1. Increased use of AI in healthcare: AI is expected to play a larger role in healthcare, including diagnosis, treatment, and patient care. AI-powered systems may be able to analyze medical images, identify patterns in patient data, and provide personalized treatment recommendations.
2. Widespread adoption of AI in industries: AI is expected to be adopted in various industries, including finance, transportation, and education. AI-powered systems may be able to automate tasks, improve efficiency, and enhance decision



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Kaida Yamato. I'm a 22-year-old university student who's been living in Tokyo for the past five years. I'm a history major, and I enjoy reading about the Meiji period. When I'm not studying, you can usually find me exploring the city, trying out new restaurants, or practicing taiko drumming. That's me in a nutshell! Feel free to ask me anything. Kaida's self-introduction is neutral and focuses on the facts about herself. It mentions her interests and hobbies, but doesn't reveal too much about her personality or emotions. This type of self-introduction is good for a

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. Paris is located in the northern part of the country in an area known as the Île-de-France.
The post Which of the following best describes the capital of France? appeared first on Superb Profes

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 El

ara

 Frost

.

 I

'm

 a

 

25

-year

-old

 journalist

 who

's

 been

 living

 in

 New

 Haven

,

 Connecticut

 for

 the

 past

 five

 years

.

 I

 enjoy

 hiking

 and

 reading

,

 and

 I

'm

 currently

 working

 on

 a

 story

 about

 the

 local

 politics

.

 That

's

 me

 in

 a

 nutshell

.


Not

 bad

,

 but

 you

 could

 add

 a

 bit

 more

 depth

 and

 interest

 to

 your

 character

.

 Here

 are

 some

 suggestions

:


1

.

 Use

 more

 descriptive

 language

 to

 paint

 a

 picture

 of

 your

 character

 in

 the

 reader

's

 mind

.

 For

 example

,

 instead

 of

 saying

 "

I

'm

 a

 

25

-year

-old

 journalist

",

 you

 could

 say

 "

I

'm

 a

 driven

 

25

-year

-old

 journalist

 with

 a

 sharp

 mind

 and



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.


Provide

 a

 concise

 factual

 statement

 about

 the

 climate

 in

 France

.

 France

 has

 a

 temper

ate

 climate

 with

 mild

 winters

 and

 warm

 summers

.


Provide

 a

 concise

 factual

 statement

 about

 the

 population

 of

 France

.

 As

 of

 

202

0

,

 the

 estimated

 population

 of

 France

 is

 approximately

 

67

 million

.


Provide

 a

 concise

 factual

 statement

 about

 the

 official

 language

 of

 France

.

 The

 official

 language

 of

 France

 is

 French

.


Provide

 a

 concise

 factual

 statement

 about

 the

 currency

 of

 France

.

 The

 official

 currency

 of

 France

 is

 the

 Euro

.


Provide

 a

 concise

 factual

 statement

 about

 the

 European

 country

 that

 shares

 the

 largest

 border

 with

 France

.

 France

 shares

 its

 largest

 border

 with

 Belgium

.

 

 

1

.

 The

 capital

 of

 France

 is



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 more

 likely

 to

 be

 integrated

 into

 our

 daily

 lives

 than

 a

 takeover

 of

 the

 human

 race

.


There

 is

 no

 one

 “

right

”

 path

 for

 AI

 development

,

 and

 the

 future

 is

 uncertain

.

 However

,

 experts

 and

 organizations

 have

 proposed

 several

 potential

 trends

 in

 AI

 development

:


1

.

 AI

 as

 a

 Utility

:

 AI

 is

 likely

 to

 be

 viewed

 as

 a

 utility

,

 like

 electricity

 or

 water

,

 that

 is

 integral

 to

 our

 daily

 lives

.

 It

 will

 be

 used

 to

 streamline

 and

 improve

 various

 aspects

 of

 life

,

 such

 as

 healthcare

,

 transportation

,

 and

 education

.


2

.

 AI

 as

 a

 Service

:

 AI

 will

 be

 offered

 as

 a

 service

,

 allowing

 businesses

 and

 individuals

 to

 access

 AI

 capabilities

 without

 having

 to

 build




In [6]:
llm.shutdown()