# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## SPECIAL WARNING!!!!

**To launch the offline engine in your python scripts,** `__main__` **condition is necessary, since we use** `spawn` **mode to create subprocesses. Please refer to this simple example**:

https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/hidden_states.py). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.01it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.65it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.29it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.13it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.19it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Diana Schuler. I am a freshman in high school and my dream is to become an astronaut. As you know, the likelihood of becoming an astronaut is very low, but I am willing to put in the hard work and dedication necessary to make my dream a reality.
In the short term, I plan to focus on excelling in my studies, particularly in mathematics and science. I have already taken advanced math and science classes and I plan to continue to challenge myself by taking AP and honors classes.
In the long term, I plan to pursue a degree in a STEM field (science, technology, engineering, and mathematics) from a top
Prompt: The president of the United States is
Generated text:  required by law to submit a budget to Congress by the first Monday in February of each year. The president's budget outlines his or her proposals for government spending and taxation for the upcoming fiscal year. Congress is then responsible for reviewing and approving the budget.
The pres

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 25-year-old freelance writer and editor living in Tokyo. I enjoy reading, trying new foods, and practicing yoga. I'm currently working on a novel and experimenting with Japanese poetry forms. I'm a bit of a introvert, but I love meeting new people and learning about their experiences.
This self-introduction is neutral because it doesn't reveal any personal opinions or biases. It simply presents Kaida's background, interests, and personality in a straightforward way. The tone is friendly and approachable, making it suitable for a variety of social situations.
Here are a few key points to consider when writing a

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris. Paris is located in the northern part of the country and is situated on the Seine River. It is the most populous city in France and is known for its rich history, art, fashion, and culture. Paris is home to many famous landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. The city is also a major hub for international business, finance, and tourism. Paris is a popular destination for visitors from around the world, attracting over 23 million tourists each year. The city is divided into 20 arrondissements, or districts, each with its own unique character

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  a topic of much speculation and debate. Some experts predict that AI will become increasingly integrated into our daily lives, while others warn of the potential risks and challenges associated with its development. Here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is already being used in healthcare to analyze medical images, diagnose diseases, and develop personalized treatment plans. In the future, AI is likely to play an even more significant role in healthcare, with the potential to revolutionize the way we diagnose and treat diseases.
2. Widespread adoption of AI in industries: AI is already being used in various industries, including



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Zara Stone. I work as an archivist in a small, private library. I spend most of my days organizing and cataloging dusty old books and documents. In my free time, I enjoy reading and learning new things. I'm a bit of a quiet and observant person, but I'm always happy to chat with those I meet.
From this introduction, we can infer a few things about Zara's personality and life:
She's likely a bit introverted and prefers to observe rather than take the lead in social situations.
She values knowledge and learning, which is reflected in her job and hobbies.
She's detail-oriented and

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris.
Provide a list of the most prominent landmarks and historical sites in Paris, including the Eiffel Tower, the Louvre Museum, the Arc de Triomphe, the Notre Dame Cathedral, the Sain

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Aurora

 Night

shade

.

 I

'm

 a

 

20

-year

-old

 student

 major

ing

 in

 environmental

 science

 at

 the

 University

 of

 Oregon

.

 I

'm

 interested

 in

 conservation

 biology

 and

 sustainable

 resource

 management

.

 In

 my

 free

 time

,

 I

 enjoy

 hiking

,

 gardening

,

 and

 playing

 the

 guitar

.

 I

'm

 a

 bit

 of

 a

 home

body

 and

 tend

 to

 prefer

 quieter

,

 more

 low

-key

 settings

,

 but

 I

'm

 always

 up

 for

 a

 good

 conversation

 or

 adventure

.

 I

'm

 a

 pretty

 laid

-back

 and

 easy

-going

 person

,

 but

 I

 can

 be

 fiercely

 passionate

 about

 the

 things

 that

 matter

 to

 me

.

 I

'm

 looking

 forward

 to

 getting

 to

 know

 you

.


This

 self

-int

roduction

 is

 neutral

 because

 it

:


Avoid

s

 any



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 Provide

 the

 most

 significant

 feature

 or

 attraction

 of

 the

 city

.

 The

 most

 significant

 feature

 or

 attraction

 of

 Paris

 is

 the

 E

iff

el

 Tower

.

 Provide

 a

 brief

 historical

 context

 of

 the

 city

.

 Paris

 has

 a

 rich

 history

 dating

 back

 to

 the

 Roman

 era

,

 and

 has

 played

 a

 significant

 role

 in

 European

 history

,

 art

,

 literature

,

 and

 fashion

.

 Discuss

 the

 significance

 of

 the

 E

iff

el

 Tower

 in

 the

 city

’s

 history

 and

 culture

.

 The

 E

iff

el

 Tower

 was

 built

 for

 the

 

188

9

 World

’s

 Fair

 and

 has

 become

 an

 iconic

 symbol

 of

 Paris

 and

 France

,

 representing

 the

 city

’s

 engineering

,

 art

istry

,

 and

 cultural

 achievements

.

 Describe

 the

 city

’s

 culture

 and



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 certain

 to

 be

 influenced

 by

 breakthrough

s

 in

 computing

,

 data

 science

,

 and

 machine

 learning

.

 The

 following

 are

 some

 possible

 future

 trends

 in

 artificial

 intelligence

:


1

.

 Increased

 usage

 of

 AI

 in

 consumer

 products

:

 As

 AI

 technology

 becomes

 more

 accessible

 and

 affordable

,

 we

 can

 anticipate

 a

 rise

 in

 AI

-powered

 consumer

 products

.

 Virtual

 assistants

,

 smart

 homes

,

 and

 personalized

 recommendations

 are

 all

 examples

 of

 AI

-driven

 products

 that

 will

 continue

 to

 become

 more

 prevalent

 in

 our

 daily

 lives

.


2

.

 AI

-powered

 medicine

 and

 healthcare

:

 AI

 has

 the

 ability

 to

 analyze

 medical

 data

,

 identify

 patterns

,

 and

 make

 predictions

 about

 patient

 outcomes

.

 Future

 breakthrough

s

 in

 AI

 may

 lead

 to

 the

 development

 of

 AI

-powered

 diagnostic

 tools




In [6]:
llm.shutdown()