# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.26it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.15it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.13it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.53it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.37it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Rafeeat Aliyu, and I am an artist, graphic designer, and writer from Abuja, Nigeria. I was born in Kaduna but raised in Abuja, the capital city of Nigeria. My interest in the arts began at a very young age, and I was fortunate to have parents who encouraged my creativity. I started drawing and painting when I was about 7 years old, and I never looked back. My love for art, design, and storytelling has only grown stronger over the years.
As an artist, I am inspired by the beauty of nature, the complexity of human emotions, and the diversity of cultures. My
Prompt: The president of the United States is
Generated text:  often referred to as the “Commander-in-Chief.” This term comes from the President’s role as the commander of the armed forces. The President is the ultimate authority over the military, and as such, has the power to make strategic decisions and give orders to military leaders.
In addition to this, the President also serves as a sy

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 22-year-old student at the University of Tokyo, studying environmental science. I'm originally from a small town in Hokkaido, where I grew up surrounded by nature and developed a strong interest in conservation. I'm currently working on a research project focused on sustainable forestry practices in Japan. When I'm not studying or working, you can find me hiking or practicing yoga. I'm a bit of a introvert, but I enjoy meeting new people and learning about their perspectives on the world. I'm excited to learn and grow as a person, and I'm looking forward to seeing where life takes me

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris.
Provide a concise factual statement about France’s capital city.
The capital of France is Paris. The city is located in the northern part of the country, along the Seine River. Paris is known for its rich history, cultural landmarks, and romantic atmosphere. It is home to famous institutions such as the Louvre Museum, Notre-Dame Cathedral, and the Eiffel Tower. The city is a major hub for international business, fashion, and tourism. Paris is also a center for education, with several prestigious universities and research institutions. The city has a diverse population of over 2.1 million people, with a strong sense

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  a topic of much speculation and debate. While it's difficult to predict exactly what the future holds, here are some possible future trends in artificial intelligence:
1. Increased Adoption in Everyday Life: AI is becoming increasingly integrated into our daily lives, from virtual assistants like Siri and Alexa to self-driving cars and personalized product recommendations. As AI technology improves, we can expect to see even more widespread adoption in various industries and aspects of life.
2. Advancements in Natural Language Processing (NLP): NLP is a key area of AI research, enabling machines to understand and generate human language. Future advancements in NLP could lead to more sophisticated chat



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Emilia Welles. I'm a 22-year-old senior at the University of Washington, studying environmental science and policy. I've grown up in the Pacific Northwest, and the natural beauty of this region has inspired my passion for sustainability and conservation. I'm excited to explore the intersection of science, policy, and community engagement to make a positive impact on the environment. That's me in a nutshell.
I'll make some changes to this self-introduction to make it more conversational and friendly. Here is the revised version:
Hi, I'm Emilia Welles. I'm a senior at the University of Washington, and I'm

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. Paris is located on the Seine River in northern France. It is the center of France’s economy, culture, and government.
Provide a concise factual statement

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Kael

in

 Dark

shadow

.

 I

'm

 a

 

19

-year

-old

 skilled

 hunt

ress

 from

 the

 village of

 Ravens

hire.

 I

've

 been

 trained

 in

 the

 art

 of

 tracking

 and

 combat

 since

 childhood

,

 and

 I

've

 grown

 accustomed

 to

 living

 off

 the

 land

.

 I

've

 heard

 rumors

 of

 a

 dark

 force

 threatening

 the nearby

 forest

,

 and

 I

'm

 considering

 joining

 a

 group to

 investigate

.

 That

's all

 for

 now

.

 I

'll

 let

 my

 actions

 speak

 for

 themselves

.


This

 text

 begins

 with

 a

 direct

 and

 simple

 greeting

,

 establishing

 the

 character

's

 name

 and

 age

.

 It

 then provides

 a

 brief

 description

 of

 the

 character

's

 background

 and

 skills

,

 highlighting

 their

 ability

 as

 a

 hunt

ress

 and

 their

 experience



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.


The

 E

iff

el

 Tower

 is

 one

 of

 the

 most

 iconic

 landmarks

 in

 Paris

.

 


The

 city

 of

 Paris

 is

 the

 capital

 of

 France

 and

 is

 home

 to

 many

 historical

 sites

 and

 landmarks

,

 such

 as

 the

 E

iff

el

 Tower

 and

 Notre

 Dame

 Cathedral

.


The

 population

 of

 Paris

 is

 approximately

 

2

.

1

 million

 people

,

 however

 the

 greater

 metropolitan

 area

 is

 much

 larger

.


The

 climate in

 Paris

 is

 temper

ate

,

 with

 warm

 summers

 and

 cold

 winters.


Provide

 a

 concise

 factual

 statement about

 France’s

 capital

 city

.

 The

 capital

 of

 France

 is

 Paris

.


Paris

 is

 a

 city

 located

 in

 the

 northern

 part

 of

 France

,

 and

 it

 is

 the

 country

's

 largest

 city

.


The

 city

 of



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 likely

 to

 be

 shaped

 by

 a

 combination

 of

 technological

 advancements

,

 societal

 needs

,

 and

 economic

 factors

.

 Here

 are

 some

 possible

 future

 trends

 in

 AI

:


1

.

 More

 widespread

 adoption

 of

 AI

 in

 industries

:

 AI

 will

 continue

 to

 be

 adopted

 in

 various

 industries

,

 including

 healthcare

,

 finance

,

 transportation

,

 and

 education

,

 leading

 to

 increased

 efficiency

,

 productivity

,

 and

 innovation

.


2

.

 Increased

 use

 of

 Explainable

 AI

 (

X

AI

):

 As

 AI

 becomes

 more

 pervasive

,

 there

 will

 be

 a

 growing

 need

 to

 understand

 how

 AI

 systems

 make

 decisions

,

 leading

 to

 the

 development

 of

 X

AI

 techniques

 that

 provide

 insights

 into

 AI

 decision

-making

 processes

.


3.

 Rise

 of

 Edge

 AI:

 With

 the

 increasing

 demand

 for




In [6]:
llm.shutdown()