# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

**To launch the offline engine in your python scripts, `__main__` condition is necessary, since we use `spawn` mode to create subprocesses. Please refer to this [simple example](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py) for more details.**

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/hidden_states.py). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.16it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.11it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.10it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.47it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.31it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Erin and I am the creator of this blog. I am a mom of two adorable kids, and I am passionate about sharing my experiences, ideas, and tips on parenting, motherhood, and life.
My goal is to create a community where mothers can come together, share their stories, and support one another. I believe that motherhood can be challenging, but it is also incredibly rewarding. I want to inspire and motivate mothers to be the best version of themselves, to love themselves, and to find joy in the journey of motherhood.
I will be sharing a variety of topics on this blog, including parenting tips, kid-friendly recipes
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States, and is the highest-ranking official in the federal government. The president serves a four-year term and is elected through the Electoral College system.
The president is responsible for executing the laws of the United

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 17-year-old high school student who enjoys reading and playing the guitar. I'm a bit of a introvert and prefer to spend my free time alone, but I'm not antisocial. I'm a bit of a perfectionist and can get anxious when things don't go as planned. I'm a bit of a dreamer and I love to imagine all the possibilities of the world. I'm still figuring out who I am and what I want to do with my life, but I'm excited to see where life takes me. I'm a bit of a hopeless romantic, and I believe in the

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris. Paris is located in the northern part of the country, along the Seine River. It is the largest city in France and is known for its rich history, art, fashion, and cuisine. Paris is home to many famous landmarks, including the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. The city has a population of over 2.1 million people and is a major hub for international business, culture, and tourism. Paris is also known for its romantic atmosphere and is often referred to as the "City of Love." The city has a diverse range of neighborhoods, each with its own unique

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  expected to be shaped by several factors, including advancements in machine learning, natural language processing, and computer vision. Here are some possible future trends in AI:
1. Increased use of AI in healthcare: AI is expected to play a larger role in healthcare, including diagnosis, treatment, and patient care. AI-powered systems will be able to analyze medical data, identify patterns, and make predictions about patient outcomes.
2. Rise of Explainable AI: As AI becomes more pervasive, there is a growing need for transparency and explainability in AI decision-making. Explainable AI (XAI) will become increasingly important to ensure that AI systems are fair



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Enna. I'm a 17-year-old high school student from a small town in the Midwest.
Here are some things you might consider including in your self-introduction:
Your name
Your age
Your occupation or student status
Your location or hometown
A brief description of your personality or interests
A few words about your background or upbringing
Here are some tips for writing a neutral self-introduction:
Avoid mentioning any sensitive or controversial topics, such as politics or religion.
Don't include any potentially embarrassing or personal details, such as your weight or medical history.
Keep your self-introduction brief and to the point. Aim for a

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. It is located in the north-central part of the country. Paris is often called the most romantic city in the world due 

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Aurora

 V

ex

,

 but

 my

 friends

 call

 me

 Rory

.

 I

'm

 a

 

17

-year

-old

 student

 who

 spends

 most

 of

 my

 free

 time

 playing

 video

 games

,

 reading

 fantasy

 novels

,

 and

 listening

 to

 indie

 music

.

 I

'm

 a

 bit

 of

 an

 intro

vert

,

 but

 I

'm

 always

 up

 for

 a

 good

 conversation

 about

 anything

 from

 the

 latest

 sci

-fi

 movie

 to

 my

 favorite

 book

 series

.

 I

'm

 a

 bit

 of

 a

 day

dream

er

,

 and

 I

 often

 get

 lost

 in

 my

 own

 thoughts

,

 but

 I

'm

 always

 happy

 to

 share

 my

 ideas

 with

 others

.


I

've

 always

 been

 fascinated

 by

 the

 world

 of

 fantasy

 and

 science

 fiction

,

 and

 I

 love

 exploring

 the

 possibilities

 of

 what

 could

 be



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.


In

 which

 century

 did

 France

’s

 capital

 city

 become

 a

 major

 cultural

 center

?

 The

 

12

th

 century

.


What

 event

 made

 the

 city

 of

 Paris

 a

 major

 cultural

 center

?

 The

 construction

 of

 Notre

 Dame

 Cathedral

.


Who

 was

 the

 leader

 who

 overs

aw

 the

 construction

 of

 Notre

 Dame

 Cathedral

?

 King

 Louis

 VII

.


What

 was

 the

 primary

 purpose

 of

 the

 Notre

 Dame

 Cathedral

?

 A

 place

 of

 worship

.


What

 is

 the

 name

 of

 the

 famous

 street

 in

 Paris

 where

 artists

 and

 writers

 gathered

?

 The

 Rue

 de

 Riv

oli

.


Who

 was

 the

 famous

 writer

 who

 lived

 in

 Paris

 and

 helped

 to

 establish

 the

 city

 as

 a

 major

 cultural

 center

?

 Victor

 Hugo

.


What

 is

 the

 name

 of

 the

 famous



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 a

 topic

 that

 is

 always

 being

 discussed

 in

 various

 fields

 of

 science

 and

 technology

.

 The

 current

 AI

 technology

 is

 based

 on

 machine

 learning

 and

 deep

 learning

.

 However

,

 future

 AI

 technology

 will

 be

 based

 on

 hybrid

 intelligence

 and

 the

 combination

 of

 human

 and

 artificial

 intelligence

.

 In

 the

 future

,

 AI

 will

 not

 only

 be

 used

 in

 various

 industries

 but

 also

 in

 our

 daily

 life

.

 Here

 are

 some

 possible

 future

 trends

 in

 AI

:


1.

 Hybrid

 Intelligence

:

 The

 future

 AI

 technology

 will

 be

 based

 on

 hybrid

 intelligence

,

 which

 is

 a

 combination

 of

 human

 intelligence

 and

 artificial

 intelligence

.

 This

 technology

 will

 enable

 humans

 and

 AI

 systems

 to

 work

 together

 to

 solve

 complex

 problems

.


2

.

 Edge

 AI

:

 The

 future

 AI




In [6]:
llm.shutdown()