# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## SPECIAL WARNING!!!!

**To launch the offline engine in your python scripts,** `__main__` **condition is necessary, since we use** `spawn` **mode to create subprocesses. Please refer to this simple example**:

https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:01<00:03,  1.03s/it]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.58it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.26it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.11it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.16it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Tatiana and I’m excited to be joining the team at Sugarfish. As a sushi chef, I’m passionate about creating traditional yet innovative sushi experiences that showcase the beauty of fresh, seasonal ingredients. I was born and raised in San Francisco, but my love for Japanese cuisine was inspired by my childhood trips to the city’s Japantown, where I was introduced to the art of sushi-making. I went on to study culinary arts at the San Francisco Cooking School and honed my skills at a number of local Japanese restaurants. I’m thrilled to be a part of the Sugarfish team, where I can share my passion for sushi with
Prompt: The president of the United States is
Generated text:  a part of the executive branch of government. The executive branch is led by the president, and it includes the vice president and the members of the president's cabinet. The president serves as both the head of state and the head of government, and is responsible for carryi

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 25-year-old freelance writer and artist living in a small town in the Pacific Northwest. I enjoy hiking and exploring the outdoors, and I'm passionate about environmental conservation. I'm currently working on a novel and a collection of short stories, and I'm excited to see where my creative projects take me. I'm a bit of a introvert, but I'm always up for a good conversation or a friendly debate. I'm looking forward to meeting new people and learning more about their interests and experiences.
This self-introduction is neutral because it doesn't reveal too much about Kaida's personality, background,

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris. Paris is located in the northern part of the country and is situated on the Seine River. It is the largest city in France and is known for its rich history, art, fashion, and culture. Paris is home to many famous landmarks such as the Eiffel Tower, Notre Dame Cathedral, and the Louvre Museum. The city is also a major hub for international business, finance, and tourism. Paris is a popular destination for visitors from around the world, attracting over 23 million tourists each year. The city has a population of over 2.1 million people and is a center for education, research, and

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  a topic of much speculation and debate. Here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is already being used in healthcare to analyze medical images, diagnose diseases, and develop personalized treatment plans. In the future, AI is likely to play an even more significant role in healthcare, with applications such as:
a. Predictive analytics: AI will be used to analyze large amounts of medical data to predict patient outcomes and identify high-risk patients.
b. Personalized medicine: AI will be used to develop personalized treatment plans based on a patient's genetic profile, medical history, and lifestyle.
c.



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Aurora Blackwood. I am a 25-year-old freelance writer living in a small town in the Pacific Northwest. I have a degree in creative writing from the University of Oregon and have published a few articles in local publications. I enjoy hiking and playing guitar in my free time. I am currently working on a novel, a coming-of-age story set in the same small town where I live. In the evenings, you can often find me sipping coffee at the local café, observing the world around me and jotting down ideas for my writing. I am a bit of a quiet and introspective person, but I appreciate the beauty of

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris, a city known for its fashion, art, and cuisine. It is located in the northern part of the country, along the Seine River, and is home to many famous landmarks such as th

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Hazel

.

 I

 am

 a

 

20

-year

-old

 student

 studying

 anthropology

 at

 a

 university

 in

 the

 United

 States

.

 I

’m

 a

 bit

 of

 a

 book

worm

 and

 enjoy

 reading

 and

 writing

 in

 my

 free

 time

.

 I

’m

 also

 quite

 interested

 in

 cultural

 diversity

 and

 travel

,

 which

 has

 led

 me

 to

 explore

 different

 parts

 of

 the

 world

 and

 learn

 about

 various

 customs

 and

 traditions

.

 I

’m

 still

 figuring

 out

 my

 career

 path

,

 but

 for

 now

,

 I

’m

 focusing

 on

 gaining

 as

 much

 knowledge

 and

 experience

 as

 possible

 through

 my

 studies

 and

 extr

ac

ur

ricular

 activities

.

 What

 do

 you

 think

 of

 my

 self

-int

roduction

?

 I

 think

 it

’s a

 good

 starting

 point

,

 but

 I

 can

 definitely

 make

 some



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 Paris

 is

 the

 largest

 city

 in

 France

 and

 is

 located

 in

 the

 north

-central

 part

 of

 the

 country

.

 It

 has

 a

 population

 of

 over

 

2

.

1

 million

 people

 and

 is

 known

 for

 its

 beautiful

 architecture

,

 art

 museums

,

 and

 historical

 landmarks

.

 Some

 of

 the

 most

 famous

 attractions

 in

 Paris

 include

 the

 E

iff

el

 Tower

,

 Notre

 Dame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 Paris

 is

 also

 a

 major

 cultural

 and

 economic

 center

,

 and

 is

 home

 to

 many

 universities

,

 hospitals

,

 and

 other

 institutions

.

 The

 city

 has

 a

 rich

 history

 dating

 back

 to

 the

 

3

rd

 century

,

 and

 has

 been

 influenced

 by

 various

 cultures

 throughout

 its

 history

,

 including

 the

 Romans

,

 the

 Gaul



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 expected

 to

 be

 shaped

 by

 several

 factors

,

 including

 advancements

 in

 machine

 learning

,

 the

 Internet

 of

 Things

 (

Io

T

),

 and

 natural

 language

 processing

.

 Here

 are

 some

 possible

 future

 trends

 in

 AI

:


1

.

 Increased

 use

 of

 AI

 in

 healthcare

:

 AI

 is

 expected

 to

 play

 a

 more

 significant

 role

 in

 healthcare

,

 from

 diagnosis

 to

 treatment

 and

 patient

 care

.

 AI

-ass

isted

 diagnosis

 and

 personalized

 medicine

 are

 likely

 to

 become

 more

 prevalent

.


2

.

 Adv

ancements

 in

 robotics

:

 Robotics

 will

 continue

 to

 evolve

,

 with

 more

 advanced

 robots

 that

 can

 learn

,

 adapt

,

 and

 interact

 with

 humans

.

 Robots

 may

 become

 more

 common

 in

 industries

 such

 as

 manufacturing

,

 logistics

,

 and

 healthcare

.


3

.

 Rise

 of

 edge




In [6]:
llm.shutdown()