# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## SPECIAL WARNING!!!!

**To launch the offline engine in your python scripts,** `__main__` **condition is necessary, since we use** `spawn` **mode to create subprocesses. Please refer to this simple example**:

https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:01<00:03,  1.01s/it]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.59it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.29it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.20it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.23it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Ashley, and I'm a travel enthusiast. I've been all over the world, and I'm always on the lookout for my next adventure. I've traveled to over 30 countries, and I've seen so many incredible things that I'm still trying to process. My favorite destinations include Japan, New Zealand, and Costa Rica, but I'm always open to new experiences and exploring different cultures.
I love trying new foods, drinks, and activities whenever I travel. I'm also a big fan of hiking, scuba diving, and trying local transportation. I believe that the best way to experience a new place is to immerse yourself
Prompt: The president of the United States is
Generated text:  not a monarch, nor does he have the power to create laws. Congress, composed of the Senate and House of Representatives, is responsible for creating and passing laws. The president can, however, veto laws passed by Congress, although Congress can override his veto with a two-thirds majority vote in b

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 17-year-old high school student living in Tokyo. I'm interested in photography and music, and I spend most of my free time taking pictures of the city and listening to J-pop. I'm a bit of a introvert, but I enjoy meeting new people and trying new things. I'm a student at a local high school, and I'm currently in my second year. I'm not really sure what I want to do with my life, but I'm taking things one step at a time. I'm a bit of a daydreamer, and I often get lost in my own thoughts.

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris. Paris is located in the northern part of the country. It is situated on the Seine River. The city has a population of over 2.1 million people. Paris is known for its cultural and historical landmarks, such as the Eiffel Tower, Notre Dame Cathedral, and the Louvre Museum. The city is also a major center for fashion, art, and cuisine. Paris is a popular tourist destination and a hub for international business and finance. The city has a rich history dating back to the 3rd century BC, with various empires and dynasties having ruled over it throughout the centuries. Today,

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  a topic of much speculation and debate. While it is difficult to predict exactly what the future will hold, here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is already being used in healthcare to analyze medical images, diagnose diseases, and develop personalized treatment plans. In the future, AI is likely to play an even larger role in healthcare, with applications such as AI-powered robots that can assist with surgeries and AI-driven chatbots that can help patients manage their health.
2. Widespread adoption of AI in industries: AI is already being used in various industries such as finance, transportation, and



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Camille and I'm a 27-year-old freelance writer. I live in Portland, Oregon, and I spend most of my free time reading, hiking, and exploring the city's food scene. I'm not a morning person, but I do enjoy a good cup of coffee and a quiet moment to myself before the day gets busy. I'm a bit of a hopeless romantic, but I've learned to be practical and take things one step at a time. I'm currently working on a novel, and I'm excited to see where this journey takes me.
This is a good start, but there are a few things you could do

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. It is located in the northern part of the country, in the Île-de-France region. Paris is a large and densely populated city, with a population of over 2.1 million people within the city limits and over 12 million people in the metropo

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Eli

an

ore

 Qu

asar

 and

 I

'm

 a

 

22

-year

-old

 ast

roph

ys

ic

ist

-in

-training

 at

 the

 prestigious

 Cele

stial

 University

.

 I

 have

 a

 strong

 passion

 for

 understanding

 the

 mysteries

 of

 the

 cosmos

 and

 a

 knack

 for

 problem

-solving

.

 I

'm

 currently

 working

 on

 my

 master

's

 thesis

,

 focusing

 on

 the

 application

 of

 quantum

 mechanics

 to

 black

 hole

 research

.

 When

 I

'm

 not

 buried

 in

 textbooks

 or

 laboratory

 equipment

,

 you

 can

 find

 me

 playing

 the

 c

ello

 or

 practicing

 yoga

.

 I

'm

 excited

 to

 meet

 new

 people

 and

 collaborate

 on

 innovative

 projects

.


I

 love

 how

 Eli

an

ore

's

 self

-int

roduction

 is

 neutral

,

 which

 makes

 it

 easy

 to

 imagine

 them

 fitting

 into

 different

 stories



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 It

 is

 the

 second

 most

 populous

 city

 in

 the

 European

 Union

.

 Paris

 is

 situated

 in

 the

 north

-central

 part

 of

 the

 country

 and

 is

 known

 for

 its

 romantic

 atmosphere

,

 iconic

 landmarks

,

 and

 rich

 cultural

 history

.

 The

 city

 is

 home

 to

 many

 museums

,

 galleries

,

 and

 theaters

,

 as

 well

 as

 famous

 landmarks

 such

 as

 the

 E

iff

el

 Tower

,

 Notre

-D

ame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 France

’s

 capital

 city

 has

 been

 a

 major

 center

 of

 politics

,

 finance

,

 and

 culture

 for

 centuries

,

 and

 its

 influence

 extends

 far

 beyond

 its

 borders

.

 Paris

 is

 a

 global

 city

,

 known

 for

 its

 fashion

,

 cuisine

,

 and

 art

,

 and

 is

 one

 of

 the

 most



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 being

 shaped

 by

 a

 number

 of

 factors

,

 including

 advancements

 in

 machine

 learning

,

 natural

 language

 processing

,

 computer

 vision

,

 and

 robotics

.

 As

 AI

 technology

 continues

 to

 evolve

,

 we

 can

 expect

 to

 see

 a

 number

 of

 exciting

 and

 potentially

 transformative

 trends

 emerge

.


Improved

 Efficiency

:

 AI

 will

 continue

 to

 automate

 routine

 and

 mundane

 tasks

,

 freeing

 up

 humans

 to

 focus

 on

 higher

-level

 creative

 and

 strategic

 work

.

 This

 will

 lead

 to

 increased

 productivity

 and

 efficiency

 across

 industries

.


Increased

 Personal

ization

:

 AI

-powered

 systems

 will

 be

 able

 to

 analyze

 vast

 amounts

 of

 data

 to

 provide

 personalized

 recommendations

 and

 experiences

 for

 individuals

.

 This

 will

 be

 particularly

 evident

 in

 fields

 like

 marketing

,

 healthcare

,

 and

 finance

.


Enh

anced

 Collaboration

:

 AI




In [6]:
llm.shutdown()