# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## SPECIAL WARNING!!!!

**To launch the offline engine in your python scripts,** `__main__` **condition is necessary, since we use** `spawn` **mode to create subprocesses. Please refer to this simple example**:

https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:01<00:03,  1.04s/it]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.57it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.26it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.11it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.17it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Mary, and I am a Christmas fanatic.  I love everything about the holiday season: the twinkling lights, the festive music, the cozy atmosphere, the delicious treats, and of course, the presents!  While I enjoy the holiday season, I am also a bit of a planner, and I like to be prepared for all the festivities.  In this post, I will share my top 10 Christmas traditions that I love and enjoy with my family and friends.

### 1. Decorating the Tree
The first tradition on my list is decorating the Christmas tree.  I love selecting a tree that is just the right size for
Prompt: The president of the United States is
Generated text:  the most powerful person on Earth, with a broad range of authorities and responsibilities. However, the president's powers are limited in several ways. The Constitution outlines the president's role and the powers of the executive branch, while also providing checks and balances on the president's authority. Here are some o

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 17-year-old high school student. I'm a bit of a bookworm and enjoy reading about history and science. I'm also a member of the school's debate team and enjoy arguing about current events. In my free time, I like to draw and paint. I'm a bit of a perfectionist, which can sometimes make it difficult for me to relax and have fun. I'm working on finding a balance between my academic and personal life. That's me in a nutshell. What do you think? Is there anything you'd like to add or change?
Here are a few suggestions to make your

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris.
Provide a concise factual statement about the population of France’s capital city. The population of Paris is approximately 2.1 million people.
Provide a concise factual statement about the location of France’s capital city. Paris is located in the northern part of France, in the Île-de-France region.
Provide a concise factual statement about the economy of France’s capital city. The economy of Paris is driven by a diverse range of industries, including finance, fashion, and tourism.
Provide a concise factual statement about the culture of France’s capital city. Paris is known for its rich cultural heritage, including its art museums, historic landmarks,

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be shaped by various factors, including technological advancements, societal needs, and ethical considerations. Here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is likely to play a larger role in healthcare, including diagnosis, treatment, and patient care. AI-powered systems may be able to analyze medical images, identify patterns in patient data, and provide personalized treatment recommendations.
2. Advancements in natural language processing: AI-powered chatbots and virtual assistants are becoming increasingly sophisticated, and may soon be able to understand and respond to human language in a more natural and intuitive way.
3. Rise of explainable



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Echo Wing. I'm a 22-year-old freelance artist living in a small town in the Pacific Northwest. I spend most of my free time drawing and painting, and I'm currently working on building a portfolio of my work.
Write a short, neutral self-introduction for a fictional character. Hello, my name is Rowan Flynn. I'm a 25-year-old geologist who's currently working as a field researcher in the Australian Outback. I'm passionate about understanding the geological history of the region and enjoy hiking and camping in my spare time.
Write a short, neutral self-introduction for a fictional character. Hello, my

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. Paris is located in the northern part of the country, in the Île-de-France region. Provide the following details: Geography and Climate: Paris is situated in th

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 August

.

 I

 have

 short

,

 dark

 hair

 and

 a

 bit

 of

 stub

ble

.

 I

 work

 as

 a

 librarian

 at

 a

 local

 library

 and

 enjoy

 reading

 classic

 literature

.

 I

 like

 to

 spend

 my

 free

 time

 exploring

 the

 outdoors

 and

 hiking

 in

 nearby

 parks

.


This

 is

 a

 good

 start

,

 but

 I

 think

 it

 could

 be

 improved

 a

 bit

.

 Here

's

 why

:


First

,

 it

 sounds

 a

 bit

 too

 formal

.

 August

 is

 introducing

 himself

,

 not

 writing

 a

 resume

.


Second

,

 the

 sentence

 about

 his

 physical

 appearance

 is

 a

 bit

 too

 descriptive

.

 While

 it

's

 nice

 to

 have

 some

 idea

 of

 what

 August

 looks

 like

,

 it

's

 not

 necessary

 to

 include

 details

 like

 his

 hair

 length

 and

 facial

 hair

.




Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.


This

 is

 a

 straightforward

,

 factual

 statement

 that

 can

 be

 used

 to

 answer

 a

 simple

 question

 about

 France

’s

 capital

 city

.

 There

 is

 no

 need

 for

 further

 analysis

 or

 opinion

 as

 the

 statement

 provides

 a

 concise

 and

 accurate

 piece

 of

 information

.


Next

:

 Provide

 a

 concise

 factual

 statement

 about

 Germany

’s

 capital

 city

.

 The

 capital

 of

 Germany

 is

 Berlin

.

 Previous

:

 Provide

 a

 concise

 factual

 statement

 about

 Italy

’s

 capital

 city

.

 The

 capital

 of

 Italy

 is

 Rome

.

 Next

:

 Provide

 a

 concise

 factual

 statement

 about

 the

 United

 Kingdom

’s

 capital

 city

.

 The

 capital

 of

 the

 United

 Kingdom

 is

 London

.

 Previous

:

 Provide

 a

 concise

 factual

 statement

 about

 China

’s

 capital

 city

.

 The

 capital

 of

 China

 is

 Beijing



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 a complex

 and

 multif

ac

eted

 topic

,

 with

 various

 experts

 and

 researchers

 offering

 different

 perspectives

 and

 predictions

.

 However

,

 some

 possible

 future

 trends

 in

 artificial

 intelligence

 include

:



1

.

 

 **

Increased

 Adoption

 in

 Everyday

 Life

**:

 AI

 is

 expected

 to

 become

 increasingly

 integrated

 into

 our

 daily

 lives

,

 from

 personal

 assistants

 like

 Alexa

 and

 Google

 Assistant

 to

 more

 sophisticated

 applications

 in

 healthcare

,

 education

,

 and

 transportation

.



2

.

 

 **

Adv

ancements

 in

 Machine

 Learning

 and

 Deep

 Learning

**:

 The

 development

 of

 more

 sophisticated

 machine

 learning

 and

 deep

 learning

 algorithms

 will

 enable

 AI

 systems

 to

 better

 understand

 and

 interact

 with

 the

 world

 around

 them

.

 This

 could

 lead

 to

 significant

 advancements

 in

 areas

 like

 computer

 vision

,

 natural

 language

 processing

,

 and




In [6]:
llm.shutdown()