# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## SPECIAL WARNING!!!!

**To launch the offline engine in your python scripts,** `__main__` **condition is necessary, since we use** `spawn` **mode to create subprocesses. Please refer to this simple example**:

https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/hidden_states.py). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:01<00:03,  1.09s/it]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.52it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.24it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.10it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.15it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Maria. I was a foster home for a few little girls and I am now a mommy to my own little one, Samantha. I am so excited to share my experiences and learn from other mamas on this journey. My family and I currently reside in a beautiful small town in the Midwest. We enjoy simple pleasures in life and love to explore the outdoors together. I am a stay-at-home mom and I take pride in taking care of my little family. My goal is to share my joys, struggles, and lessons learned with other mamas and create a community that supports and uplifts each other.
When did you first become a mom
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States. The president leads the executive branch of the federal government and is the commander-in-chief of the United States Armed Forces. The president is indirectly elected by the people through the Electoral College. The office of the president of t

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 25-year-old freelance writer and editor. I live in a small apartment in the city with my cat, Luna. I enjoy reading, hiking, and trying out new recipes in my free time. I'm a bit of a introvert, but I'm always up for a good conversation.
This self-introduction is neutral because it doesn't reveal any personal biases or opinions. It simply states the character's name, occupation, and interests, without expressing any enthusiasm or passion. This can be useful for a character who is still getting to know themselves or who is trying to keep a low profile.
Here are

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris.
The capital of France is Paris. This is a factual statement that provides a concise and accurate answer to the question. It does not include any additional information or opinions, making it a suitable choice for a factual statement. The statement is also neutral and does not contain any bias, which is another characteristic of a good factual statement. Overall, the statement is clear, concise, and accurate, making it a good example of a factual statement. 
Here are some other examples of factual statements about Paris:
- The Eiffel Tower is located in Paris.
- The Louvre Museum is located in Paris.
- The Seine River runs

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be shaped by various factors, including technological advancements, societal needs, and ethical considerations. Here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is likely to play a larger role in healthcare, including diagnosis, treatment, and patient care. AI-powered systems may be able to analyze large amounts of medical data, identify patterns, and make predictions about patient outcomes.
2. Advancements in natural language processing: Natural language processing (NLP) is a key area of AI research, and future advancements in this area may enable AI systems to better understand and generate human language. This could lead to



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Ethan Wells. I'm a freelance writer and part-time barista. I enjoy reading classic literature and taking long walks in the city. That's me in a nutshell.
In this example, the introduction is short and to the point, providing a basic summary of who the character is and what they do. The final phrase, "That's me in a nutshell," is a casual, conversational way to end the introduction, implying that there's more to the character but that this is a good starting point.
To write a short, neutral self-introduction for a fictional character, you could try the following:
1. Start with the character

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  located on the Seine River in the northern part of the country, near the English Channel. The city is known for its famous landmarks such as the Eiffel Tower, the Arc de Triom

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Z

ara

,

 but

 my

 friends

 call

 me

 Z

ee

.

 I

'm

 a

 

22

-year

-old

 who

 has

 just

 graduated

 from

 college

 with

 a

 degree

 in

 environmental

 science

.

 I

 enjoy

 hiking

 and

 reading

 books

 about

 history

 and

 science

.

 I

'm

 currently

 looking

 for

 a

 job

 that

 allows

 me

 to

 work

 outdoors

 and

 make

 a

 positive

 impact

 on

 my

 community

.


Next

,

 let

's

 write

 a

 short

,

 positive

 self

-int

roduction

 for

 a

 fictional

 character

.

 Hi

,

 I

'm

 Sofia

,

 but

 my

 loved

 ones

 call

 me

 Sof

i

!

 I

'm

 a

 

28

-year

-old

 who

's

 passionate

 about

 empowering

 others

 and

 spreading

 kindness

 wherever

 I

 go

.

 After

 working

 as

 a

 social

 worker

 for

 several

 years

,

 I

've



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.


Capital

 cities

 are

 located

 in

 the

 political

 center

 of

 their

 respective

 countries

.

 Describe

 the

 significance

 of

 the

 capital

 city

 of

 France

.

 Paris

,

 the

 capital

 of

 France

,

 is

 a

 symbol

 of

 the

 country

’s

 rich

 history

,

 culture

,

 and

 architecture

.

 It

 serves

 as

 the

 center

 of

 French

 politics

,

 economy

,

 and

 society

,

 hosting

 numerous

 national

 institutions

,

 such

 as

 the

 É

lys

ée

 Palace

,

 the

 National

 Assembly

,

 and

 the

 Senate

.

 The

 city

 is

 also

 home

 to

 many

 museums

,

 art

 galleries

,

 and

 cultural

 institutions

,

 showcasing

 the

 country

’s

 contributions

 to

 art

,

 literature

,

 and

 science

.


What

 are

 some

 key

 attractions

 in

 Paris

 that

 tourists

 commonly

 visit

?

 Some

 popular

 attractions

 in

 Paris



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 a

 topic

 of

 much

 debate

 and

 speculation

.

 Some

 experts

 predict

 that

 AI

 will

 become

 increasingly

 integrated

 into

 our

 daily

 lives

,

 while

 others

 believe

 that

 its

 impact

 will

 be

 more

 limited

.

 Here

 are

 some

 possible

 future

 trends

 in

 artificial

 intelligence

:


1

.

 AI

 assistants

:

 Virtual

 assistants

 like

 Siri

,

 Alexa

,

 and

 Google

 Assistant

 will

 become

 even

 more

 advanced

,

 allowing

 us

 to

 control

 our

 homes

,

 cars

,

 and

 other

 devices

 with

 our

 voices

.


2

.

 Aug

mented

 reality

:

 AI

 will

 be

 used

 to

 create

 more

 sophisticated

 augmented

 reality

 experiences

,

 making

 it

 possible

 for

 us

 to

 interact

 with

 virtual

 objects

 and

 environments

 in

 a

 more

 natural

 way

.


3

.

 Autonomous

 vehicles

:

 Self

-driving

 cars

 will

 become

 more

 common




In [6]:
llm.shutdown()