# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## SPECIAL WARNING!!!!

**To launch the offline engine in your python scripts,** `__main__` **condition is necessary, since we use** `spawn` **mode to create subprocesses. Please refer to this simple example**:

https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/launch_engine.py

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:01<00:03,  1.04s/it]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.55it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.24it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.13it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.17it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Melissa. I’m a California-based artist and art educator. I’ve been creating art since I was a child and I’m passionate about sharing my love of art with others. I have a BA in Studio Art from California State University, Long Beach and an MA in Art Education from the University of California, Irvine. I have taught art in various settings, including public and private schools, community centers, and non-profit organizations. I believe that art is a powerful tool for self-expression and social change, and I strive to create inclusive and engaging learning experiences that promote creativity, critical thinking, and empathy.
What I love about teaching art is the
Prompt: The president of the United States is
Generated text:  doing everything possible to divide the nation, writes a Times of Israel contributor.
Despite its multitude of flaws and issues, the United States is one of the greatest nations in history. It has been a beacon of freedom, demo

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 25-year-old freelance writer and artist living in a small town in the Pacific Northwest. I enjoy hiking, reading, and trying out new recipes in my free time. I'm a bit of a introvert, but I'm always up for a good conversation. I'm currently working on a novel and a few art projects that I'm excited to share with the world someday. That's me in a nutshell.
This is a good example of a neutral self-introduction because it doesn't reveal too much about Kaida's personality or background. It simply states her name, occupation, and a few interests.

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris.
The Eiffel Tower is a famous landmark in Paris, France. It was built for the 1889 World's Fair and was intended to be a temporary structure. However, it has become an iconic symbol of Paris and one of the most recognizable landmarks in the world. The Eiffel Tower stands at 324 meters (1,063 feet) tall and is made of iron. It has been repurposed over the years and now offers stunning views of the city from its observation decks. Visitors can take the elevator or stairs to the top for a breathtaking view of the City of Light.
The Eiffel Tower is

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  a topic of much speculation and debate. While it is difficult to predict exactly what the future will hold, here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is already being used in healthcare to analyze medical images, diagnose diseases, and develop personalized treatment plans. In the future, AI is likely to play an even more significant role in healthcare, with the potential to revolutionize the way we diagnose and treat diseases.
2. Widespread adoption of AI in industries: AI is already being used in various industries such as finance, transportation, and customer service. In the future, AI is likely



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Ellis Blackwood. I'm a 30-year-old freelance writer living in a small town in the Pacific Northwest. I spend most of my time writing, hiking, and exploring the local outdoors.
Write a short, neutral self-introduction for a fictional character. Hello, my name is Piper Grey. I'm a 25-year-old art student living in the city, currently studying fine art and ceramics. I work part-time at a local gallery and enjoy spending my free time reading, painting, and making music. How do you feel about having an art degree? Do you think it's a valuable investment of time and money?
Write a

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris.
Paris is a major city in France. What is one notable landmark in Paris? The Eiffel Tower is a notable landmark in Paris.
The Eiffel Tower was built in the late 1800s. What was its ori

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Kai

 T

era

uchi

.

 I

'm

 a

 

20

-year

-old

 Japanese

-

Canadian

 student

 at

 the

 University

 of

 British

 Columbia

.

 I

'm

 currently

 studying

 environmental

 science

 and

 geography

.


Short

,

 neutral

,

 and

 informative

,

 this

 self

-int

roduction

 provides

 a

 brief

 overview

 of

 the

 character

's

 identity

 and

 background

.

 It

's

 suitable

 for

 various

 social

 or

 professional

 settings

.

 However

,

 to

 add

 a

 bit

 of

 personality

 and

 depth

,

 here

 are

 some

 suggestions

:



*

  

 Add

 a

 personal

 interest

 or

 hobby

:

 "

When

 I

'm

 not

 studying

,

 I

 enjoy

 hiking

 and

 exploring

 the

 Pacific

 coast

."


*

  

 Mention

 a

 career

 goal

:

 "

My

 long

-term

 goal

 is

 to

 work

 with

 a

 government

 agency

 or

 non

-profit



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 The

 city

 is

 located

 in

 the

 Î

le

-de

-F

rance

 region

.

 It

 is

 situated

 in

 the

 north

-central

 part

 of

 the

 country

,

 where

 the

 Se

ine

 River

 flows

.

 Paris

 is

 a

 global

 hub

 for

 culture,

 fashion

,

 cuisine

,

 and

 art

.

 It

 is

 home

 to

 world

-ren

owned

 landmarks

 like

 the

 E

iff

el

 Tower

 and

 the

 Lou

vre

 Museum

.

 Paris

 has

 a

 rich

 history

 dating

 back

 to

 the

 

3

rd

 century

 and

 is

 a

 popular

 tourist

 destination

,

 attracting

 millions

 of

 visitors

 each

 year

.

 


The

 following

 is

 a

 detailed

 analysis

 of

 the

 statement

:


1

.

 Ge

ographical

 Location

:

 The

 statement

 correctly

 identifies

 Paris

 as

 the

 capital

 of

 France

 and

 situ

ates

 it

 in



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 a

 subject

 of

 much

 speculation

 and

 debate

.

 Here

 are

 some

 possible

 future

 trends

 in

 AI

:


1

.

 **

Increased

 Adoption

 of

 Edge

 AI

**:

 With

 the

 proliferation

 of IoT

 devices

,

 edge

 AI

 is

 becoming

 increasingly

 important

.

 Edge

 AI

 allows

 AI

 algorithms

 to

 run

 on

 devices

 at

 the

 edge

 of

 the

 network

,

 reducing

 latency

 and

 improving

 real

-time

 processing

.


2

.

 **

Adv

ances

 in

 Explain

ability

 and

 Transparency

**:

 As

 AI

 becomes

 more

 pervasive

,

 there

 is

 a

 growing

 need

 to

 understand

 how

 AI

 systems

 make

 decisions

.

 Explain

ability

 and

 transparency

 will

 become

 increasingly

 important

 to

 build

 trust

 in

 AI

 systems

.


3

.

 **

Growing

 Importance

 of

 Human

-A

I

 Collaboration

**:

 AI

 systems

 will

 become

 more

 effective

 when

 they




In [6]:
llm.shutdown()