# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.20it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.12it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.10it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.46it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.31it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Clare and I am a 37-year-old, wife, mother of two, and a passionate Christian. I am a blogger and a writer, and my blog is called 'Faith and Family Matters'. On my blog, I share my experiences as a mother, wife, and Christian, and I also share my insights and reflections on various topics that interest me, such as family, relationships, faith, and parenting.
I am a stay-at-home mom and I have two beautiful children, a son who is 7 years old, and a daughter who is 4 years old. My husband and I are both passionate about our faith and
Prompt: The president of the United States is
Generated text:  responsible for overseeing the federal government, which is divided into three branches: the legislative, executive, and judicial. The president is the head of the executive branch, which is responsible for enforcing the laws passed by Congress.
Some of the key responsibilities of the president include:
Providing leadership to the federal government
Appo

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 17-year-old high school student who enjoys reading and playing video games in my free time. I'm a bit of a introvert and prefer to keep to myself, but I'm always up for a good conversation when I feel comfortable enough. I'm a bit of a perfectionist, which can sometimes make me come across as a bit too critical or uptight, but I'm working on that. I'm a bit of a bookworm, and I love getting lost in a good story. I'm not really sure what I want to do with my life yet, but I'm taking things one step

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris.
The capital of France is Paris. This statement is a concise factual statement about France’s capital city. It provides a clear and direct answer to the question, without any additional information or context. The statement is also accurate and verifiable, making it a reliable source of information. Overall, this statement meets the criteria for a concise factual statement. The statement is also neutral and does not express any opinion or bias, which is another important characteristic of a concise factual statement. The statement is also written in a clear and concise manner, making it easy to understand. The statement is also free from any grammatical or spelling errors, which adds

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  expected to be shaped by several factors, including advancements in machine learning, natural language processing, and computer vision. Here are some possible future trends in AI:
1. Increased use of AI in healthcare: AI is expected to play a larger role in healthcare, including diagnosis, treatment, and patient care. AI-powered systems can analyze medical images, identify patterns, and make predictions about patient outcomes.
2. Rise of explainable AI: As AI becomes more pervasive, there is a growing need for transparency and explainability in AI decision-making. Explainable AI (XAI) aims to provide insights into how AI models make decisions, which can help



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Aisha Elwes. I'm a 23-year-old, freelance writer and editor from a small town in the Pacific Northwest. I'm fairly new to the city and have been working remotely for the past year. I have a passion for storytelling and enjoy exploring the intersection of technology and society. When I'm not working, you can find me trying out new coffee shops or hiking in the nearby woods. I'm looking forward to meeting new people and collaborating on projects. I'm friendly, organized, and always up for a challenge. How to Write a Good Self-Introduction: Here are some tips to help you craft a compelling and

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris.
The capital of France is Paris.
Paris is the capital and largest city of France, situated in the northern part of the country along the Seine River. It is the center o

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Eve

 Caul

field

.

 I

'm

 a

 junior

 at

 Stone

brook

 High

 School

,

 and

 I

 enjoy

 reading

,

 hiking

,

 and

 spending

 time

 with

 my

 friends

.

 That

's

 me

 in

 a

 nutshell

.

 (

 Character

 Analysis

:

 Eve

 is

 a

 fairly

 laid

-back

 and

 easy

-going

 person

 who

 doesn

't

 like

 to

 draw

 attention

 to

 herself

.

 She

's

 a

 bit

 of

 a

 bookworm

 and

 values

 her

 relationships

 with

 her

 friends

.

 She

's

 not

 really

 the

 type

 to

 seek

 the

 spotlight

 or

 take

 risks

.)

 If

 you

 were

 to

 write

 this

 introduction

,

 what

 characteristics

 of

 the

 character

 should

 you

 include

 to

 make

 it

 more

 believable

?

 Some

 possible

 suggestions

 include

:

 *

 Details about

 her

 personality

:

 -

 L

aid

-back

 and

 easy



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.


Paris

 is

 the

 capital

 and

 largest

 city

 of

 France

,

 located

 in

 the

 north

-central

 part

 of

 the

 country

.

 It

 is

 situated

 on

 the

 Se

ine

 River

 and

 is

 home

 to

 many

 famous

 landmarks

 such

 as

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 and

 Notre

 Dame

 Cathedral

.

 The

 city

 has

 a

 population

 of

 over

 

2

.

1

 million

 people

 and

 is

 a

 major

 center

 for

 culture

,

 fashion

,

 and

 cuisine

.

 Paris

 has

 a

 rich

 history

,

 dating

 back

 to

 the

 Roman

 Empire

,

 and

 has

 been

 a

 major

 influence

 on

 art

,

 literature

,

 and

 politics

 throughout

 the

 centuries

.

 Today

,

 it

 is

 a

 popular

 tourist

 destination

 and

 a

 hub

 for

 international

 business

 and

 finance

.



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 vast

 and

 exciting

,

 and

 there

 are

 numerous

 potential

 trends

 that

 could

 shape

 the

 field

 in

 the

 coming

 years

.

 Here

 are

 some

 possible

 future

 trends

 in

 AI

:


1

.

 Increased

 Integration

 with

 the

 Internet

 of

 Things

 (

Io

T

)


As

 the

 Internet

 of

 Things

 (

Io

T

)

 continues

 to

 grow

,

 we

 can

 expect

 AI

 to

 play

 a

 larger

 role

 in

 connecting

 and

 controlling

 these

 devices

.

 AI

 will

 be

 able

 to

 learn

 from

 the

 data

 generated

 by

 these

 devices

,

 making

 it

 possible

 to

 create

 more

 personalized

 and

 efficient

 systems

.


2

.

 Adv

ancements

 in

 Natural

 Language

 Processing

 (

N

LP

)


Natural

 Language

 Processing

 (

N

LP

)

 is

 a

 key

 area

 of

 AI

 research

 that

 focuses

 on

 enabling

 computers

 to




In [6]:
llm.shutdown()