# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## SPECIAL WARNING!!!!

**To launch the offline engine in your python scripts,** `__main__` **condition is necessary, since we use** `spawn` **mode to create subprocesses. Please refer to this simple example**:

https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:01<00:03,  1.05s/it]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.46it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.32it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.18it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.21it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Ray.
I'm a sophomore in high school.
I love playing video games and watching YouTube videos.
My favorite subjects in school are math and science.
I'm not very good at English, but I try my best.
I have a pet cat named Whiskers.
I'm a bit shy, but I like to talk to people online.
I'm not very good at sports, but I like to play basketball sometimes.
I'm a bit of a nerd, but I don't mind.
I like to listen to music, especially K-pop.
I'm not very good at cooking, but I like to eat pizza and ramen.
That
Prompt: The president of the United States is
Generated text:  the head of state and head of government for the United States. The president is elected by the Electoral College every four years. The president is also commander-in-chief of the U.S. military. The president is also responsible for signing bills into law.
The president of the United States is also a ceremonial head of state. This means that the president has many formal duties, such as 

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 25-year-old freelance writer and artist living in a small town in the Pacific Northwest. I enjoy hiking, reading, and trying out new recipes in my spare time. I'm a bit of a introvert, but I love meeting new people and hearing their stories. I'm currently working on a novel and a few art projects that I'm excited to share with the world someday. That's me in a nutshell! How would you describe Kaida? What are her personality traits and characteristics? Use evidence from the text to support your answer. Kaida appears to be a creative and introspective person. She

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris. Paris is located in the northern part of the country, near the Seine River. It is the largest city in France and is known for its rich history, art, fashion, and culture. Paris is home to many famous landmarks, including the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. The city has a population of over 2.1 million people and is a major hub for international business, education, and tourism. Paris is also known for its romantic atmosphere, with its charming streets, cafes, and parks. Overall, Paris is a vibrant and iconic city that is steeped in history

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  a topic of much speculation and debate. Some experts predict that AI will become increasingly integrated into our daily lives, while others warn of the potential risks and challenges associated with its development. Here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is already being used in healthcare to analyze medical images, diagnose diseases, and develop personalized treatment plans. In the future, AI is likely to play an even more significant role in healthcare, potentially leading to improved patient outcomes and more efficient healthcare systems.
2. Widespread adoption of AI in education: AI has the potential to revolutionize the way we learn,



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Yasmin al- Rashid. I’m a 24-year-old librarian from a small town in Saudi Arabia. I love reading, writing, and learning about history and culture. I’m here to pursue a graduate degree in library science. My hobbies include taking long walks, practicing calligraphy, and playing the oud.
This self-introduction is neutral because it doesn’t reveal any significant details about Yasmin’s personality, values, or motivations. It simply presents some basic facts about her life and interests. The introduction is also concise and clear, making it easy to understand. However, it’s worth noting that the introduction could be more engaging

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. It is located in the northern part of the country and is situated on the Seine River. Paris is known for its beautiful architectur

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Lin

nea

 Green

leaf

.

 I

'm

 a

 

25

-year

-old

 bot

an

ist

 currently

 living

 in

 a

 small

,

 rural

 town

 surrounded

 by

 vast

 forests

.

 I

 enjoy

 spending

 my

 free

 time

 exploring

 the

 local

 flora

 and

 learning

 about

 its

 unique

 properties

.

 I

'm

 not

 one

 for

 grand

 adventures

,

 but

 I

 do

 appreciate

 the

 simple

 joys

 of

 discovering

 hidden

 streams

 and

 secret

 gl

ades

.


I

 like

 how

 Lin

nea

 is

 just

 a

 quiet

,

 un

assuming

 character

.

 She

's

 not

 trying

 to

 draw

 attention

 to

 herself

,

 but

 she

's

 content

 with

 her

 simple

 life

 in

 the

 woods

.

 She

's

 got

 a

 passion

 for

 bot

any

,

 which

 is

 a

 nice

 subtle

 detail

 that

 could

 be

 developed

 into

 a

 larger

 interest



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.


Provide

 a

 concise

 factual

 statement

 about

 the

 official

 language

 of

 France

.

 The

 official

 language

 of

 France

 is

 French

.


Provide

 a

 concise

 factual

 statement

 about

 the

 currency

 of

 France

.

 The

 currency

 of

 France

 is

 the

 Euro

.


Provide

 a

 concise

 factual

 statement about

 France

’s

 highest

 mountain

 peak

.

 The

 highest

 mountain

 peak

 in

 France

 is

 Mont

 Blanc

,

 located

 in

 the

 Alps

 on

 the

 border

 with

 Italy

.


Provide

 a

 concise

 factual

 statement

 about

 France

’s

 coastline

.

 France

 has

 a

 coastline

 along

 the

 Atlantic

 Ocean

 to

 the

 west

 and

 the

 Mediterranean

 Sea

 to

 the

 south

,

 east

,

 and

 northwest

.


Provide

 a

 concise

 factual

 statement

 about

 the

 population

 of

 France

.

 The

 population

 of

 France

 is

 approximately

 

67

 million

 people



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 a topic

 of

 much

 debate

,

 with

 opinions

 ranging

 from

 ut

opian

 to

 dyst

opian

.

 However

,

 based

 on

 current

 technological

 advancements

 and

 trends

,

 here

 are

 some

 possible

 future

 trends

 in

 artificial

 intelligence

:


1

.

 Increased

 Adoption

 in

 Various

 Industries

:

 AI

 is

 already

 being

 used

 in

 various

 industries

 such

 as

 healthcare

,

 finance

,

 transportation

,

 and

 education

.

 In

 the

 future

,

 we

 can

 expect

 to

 see

 AI

 adoption

 increase

 in

 other

 industries

 such

 as

 manufacturing

,

 logistics

,

 and

 customer

 service

.


2

.

 Adv

ancements

 in

 Natural

 Language

 Processing

 (

N

LP

):

 N

LP

 has

 made

 significant

 progress

 in

 recent

 years

,

 enabling

 AI

 systems

 to

 understand

 and

 generate

 human

-like

 language

.

 In

 the

 future

,

 we

 can

 expect




In [6]:
llm.shutdown()