# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

**To launch the offline engine in your python scripts, `__main__` condition is necessary, since we use `spawn` mode to create subprocesses. Please refer to this [simple example](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py) for more details.**

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/hidden_states.py). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.04it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.69it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.36it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.19it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.25it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Anthony. I am a mathematician, a logician, and a theoretical physicist. I have an amateur interest in philosophy and computer science. I also enjoy playing chess and other strategy games.
What is your area of research? I am particularly interested in mathematical physics, especially general relativity and quantum field theory. I am also interested in the foundations of mathematics and the philosophy of science.
Hello, Anthony! I'm a theoretical computer scientist, which means I spend most of my time thinking about algorithms, computational complexity, and the like. I've also had some exposure to machine learning and data science.
That's fascinating! I've always been
Prompt: The president of the United States is
Generated text:  often characterized as a symbol of power, prestige, and authority. However, this perception is not the only reality, and many individuals in the United States and around the world have a more nuanced view of the preside

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 25-year-old freelance writer and editor. I live in a small apartment in the city with my cat, Luna. I enjoy reading, hiking, and trying out new restaurants. I'm a bit of a introvert, but I'm always up for a good conversation.
This self-introduction is neutral because it doesn't reveal any personal opinions or biases. It simply states the character's name, age, occupation, living situation, and interests. It also mentions the character's personality trait of being an introvert, but in a way that doesn't make it sound like a negative or positive trait. Instead

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris. Paris is located in the northern part of the country and is situated on the Seine River. It is the largest city in France and is known for its rich history, art, fashion, and culture. Paris is home to many famous landmarks such as the Eiffel Tower, Notre Dame Cathedral, and the Louvre Museum. The city is also known for its romantic atmosphere and is often referred to as the City of Light. Paris is a popular tourist destination and is home to many international organizations and institutions, including the United Nations Educational, Scientific and Cultural Organization (UNESCO). The city has a population of over 2.

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  a topic of much speculation and debate. Some experts predict that AI will become increasingly integrated into our daily lives, while others warn of the potential risks and challenges associated with its development. Here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is already being used in healthcare to analyze medical images, diagnose diseases, and develop personalized treatment plans. In the future, AI is likely to play an even more significant role in healthcare, with the potential to improve patient outcomes and reduce healthcare costs.
2. Widespread adoption of AI in the workplace: AI is already being used in many industries to automate tasks



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Saskia Ellis, and I'm a 27-year-old office administrator living in a small apartment in downtown Cleveland. I work for a marketing firm, and I enjoy hiking and trying new craft beers in my free time. I'm a detail-oriented person who values efficiency and organization. I'm also a bit of a introvert and often prefer to spend time alone or with close friends.
Saskia Ellis is a 27-year-old office administrator living in a small apartment in downtown Cleveland. She works for a marketing firm and enjoys hiking and trying new craft beers in her free time. She is a detail-oriented person who values efficiency and organization

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. France is a country located in Europe. It is the largest country in the European Union and the third-largest country in Europe. France shar

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Clara

 Knight

.

 I

'm

 a

 

25

-year

-old

 graduate

 student

 at

 the

 University

 of

 Michigan

,

 where

 I

'm

 pursuing

 a

 Master

's

 degree

 in

 Environmental

 Studies

.

 I

'm

 originally

 from

 a

 small

 town

 in

 up

state

 New

 York

,

 but

 I

've

 been

 living

 in

 Ann

 Arbor

 for

 the

 past

 few

 years

.

 I

'm

 interested

 in

 sustainable

 agriculture

,

 conservation

,

 and

 social

 justice

.

 I

'm

 currently

 working

 on

 my

 thesis

, which

 focuses

 on

 the

 intersection

 of

 food

 systems

 and

 community

 development

.


This

 is

 a

 good

 introduction

 because

 it

:


Provides

 the

 character

's

 name

 and

 age




Lists

 the

 character

's

 current

 occupation

 or

 academic

 pursuit




M

entions

 the

 character

's

 hometown

 or

 place

 of

 origin




Lists



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 

4

.

 What

 is

 the

 capital

 of

 the

 state

 of

 New

 York

?

 

5

.

 Provide

 a

 concise

 factual

 statement

 about

 New

 York

’s

 capital

 city

.

 Albany

 is

 the

 capital

 of

 New

 York

.

 

6

.

 What

 is

 the

 capital

 of

 the

 state

 of

 Florida

?

 

7

.

 Provide

 a

 concise

 factual

 statement

 about

 Florida

’s

 capital

 city

.

 Tall

ahas

see

 is

 the

 capital

 of

 Florida

.

 

8

.

 What

 is

 the

 capital

 of

 the

 state

 of

 Texas

?

 

9

.

 Provide

 a

 concise

 factual

 statement

 about

 Texas

’s

 capital

 city

.

 Austin

 is

 the

 capital

 of

 Texas

.


Answer

:

 

1

.

 The

 capital

 of

 France

 is

 Paris

.

 

2

.

 The

 capital

 of

 New



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 both

 exciting

 and

 unsettling

.

 It

 has

 the

 potential

 to

 revolution

ize

 many

 aspects

 of

 our

 lives

,

 from

 healthcare

 and

 education

 to

 transportation

 and

 finance

.

 However

,

 it

 also

 raises

 significant

 ethical

 and

 societal

 concerns

.

 Here

 are

 some

 possible

 future

 trends

 in

 AI

:


1

.

 Increased

 Adoption

 of

 Explain

able

 AI

 (

X

AI

):


As

 AI

 becomes

 more

 pervasive

,

 there

 is

 a

 growing

 need

 to

 understand

 how

 AI

 decisions

 are

 made

.

 Explain

able

 AI

 (

X

AI

)

 will

 become

 increasingly

 important

 to

 ensure

 transparency

 and

 trust

 in

 AI

-driven

 systems

.


2

.

 More

 AI

-powered

 Assist

ants

:


Virtual

 assistants

 like

 Siri

,

 Alexa

,

 and

 Google

 Assistant

 will

 become

 more

 sophisticated

 and

 integrated

 into

 our

 daily

 lives

.




In [6]:
llm.shutdown()