# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.23it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.17it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.15it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.57it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.39it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Marley, and I am a member of the relatively new Lycan pack based in the city of Ashwood. My pack is led by my brother, Ryder, who has been doing an amazing job of rebuilding and protecting our pack.
As you may know, the Lycans have had a bit of a tough time in recent years. Many of our packs were destroyed, and our people were forced to flee to other packs or to live in secret. But Ryder has been working hard to rebuild our pack and create a safe and welcoming community for all Lycans.
I am proud to be a part of this pack, and I
Prompt: The president of the United States is
Generated text:  reportedly willing to extend a hand of peace to the new prime minister of Canada, Justin Trudeau.
The two leaders are expected to meet for the first time since Trudeau's election in the coming weeks.
A source close to the White House told CBC News that President Barack Obama has been briefed on the situation and is looking forward to making a strong impress

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 17-year-old high school student. I'm a bit of a bookworm and enjoy reading about history and science. I'm also a member of the school's debate team. I'm a bit shy, but I'm working on being more outgoing. I'm a junior, and I'm looking forward to the rest of the school year. That's me in a nutshell.
This is a good start, but it's a bit too straightforward. You might want to add a bit more depth or personality to the introduction. Here are a few suggestions:
* Instead of saying "I'm a bit of a

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris.
Provide a concise factual statement about France’s capital city.
The capital of France is Paris.  Paris is a city located in the northern part of France, along the Seine River. It is known for its iconic landmarks such as the Eiffel Tower, Notre Dame Cathedral, and the Louvre Museum. Paris is a major cultural and economic center, and it is home to many international organizations, including the United Nations Educational, Scientific and Cultural Organization (UNESCO). The city has a population of over 2.1 million people and is a popular tourist destination, attracting millions of visitors each year.  Paris is also known

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  expected to be shaped by several factors, including advancements in machine learning, natural language processing, and computer vision. Here are some possible future trends in AI:
1. Increased use of AI in healthcare: AI is expected to play a larger role in healthcare, including diagnosis, treatment, and patient care. AI-powered systems can analyze medical data, identify patterns, and make predictions, leading to more accurate diagnoses and personalized treatment plans.
2. Rise of Explainable AI (XAI): As AI becomes more pervasive, there is a growing need for transparency and accountability. XAI aims to provide insights into how AI models make decisions, enabling users to



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Elara Vex, and I'm a skilled archaeologist and explorer with a passion for uncovering the secrets of the ancient world. I've spent years studying the mysteries of lost civilizations and have a keen eye for detail, which often proves invaluable in deciphering obscure artifacts and ruins.
I'm currently leading an expedition to explore the long-abandoned city of Zha'thik, hidden deep within the scorching desert of Aridian. My team and I are driven by a sense of discovery and a desire to unravel the enigmas of this forgotten place. We've already encountered some intriguing clues, and I'm eager to see

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris.
What is the name of the river that runs through France’s capital city? The Seine River.
What is the name of the famous tower in France’s capital city? The Eiffel

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Aster

 L

umen

,

 and

 I

'm

 a

 

17

-year

-old

 high

 school

 student

 from

 a

 family

 of

 modest

 means

.

 I

 enjoy

 reading

 and

 writing

 in

 my

 free

 time

,

 and

 I

'm

 a

 bit

 of

 a

 quiet

,

 intros

pective

 person

.

 I

'm

 not

 particularly

 outgoing

 or

 athletic

,

 but

 I

'm

 working

 on

 developing

 my

 interests

 and

 skills

.

 How

 can

 I

 improve

 this

 introduction

?


1

.

 Add

 a

 personal

 detail

 that

 reveals

 character

:

 You

 could

 describe

 a

 hobby

 or

 interest

 that

 reveals

 something

 about

 Aster

's

 personality

 or

 background

.

 For

 example

,

 "

I

'm

 a

 bit

 of

 a

 quiet

,

 intros

pective

 person

 who

 loves

 to

 read

 fantasy

 novels

 and

 write

 her

 own

 stories

."


2

.

 Use



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 The

 E

iff

el

 Tower

 is

 one

 of

 Paris

's

 most

 famous

 landmarks

 and

 is

 located

 in

 the

 city

.

 Source

:

 E

iff

el

 Tower

 Official

 Website

 (

Source

 verified

 by

 Sn

opes

)


The

 E

iff

el

 Tower

 Official

 Website

 is

 a

 reliable

 source

 that

 provides

 information

 on

 the

 E

iff

el

 Tower

.

 Sn

opes

 is

 a

 fact

-check

ing

 website

 that

 verifies

 the

 accuracy

 of

 the

 information

 provided

.

 Therefore

,

 the

 statement

 that

 the

 E

iff

el

 Tower

 is

 located

 in

 Paris

 is

 supported

 by

 a

 reliable

 source

.


Here

's

 why

 this

 statement

 should

 be

 used

 as

 evidence

:


The

 statement

 is

 concise

 and

 factual

,

 providing

 a

 clear

 answer

 to

 the

 question

 of

 where

 the

 E

iff

el

 Tower



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 all

 but

 certain

 to

 be

 shaped

 by

 the

 rapid

 progress

 of

 machine

 learning

,

 deep

 learning

,

 and

 natural

 language

 processing

.

 AI

 will

 continue

 to

 improve

 in

 the

 areas

 of

 visual

 perception

,

 robotics

,

 and

 decision

-making

,

 leading

 to

 the

 development

 of

 more

 advanced

 and

 sophisticated

 AI

 systems

.

 Artificial

 intelligence

 is

 rapidly

 advancing

 and

 will

 continue

 to

 change

 the

 world

 in

 the

 coming

 years

.

 It

 will

 be

 shaped

 by

 the

 rapid

 progress

 of

 machine

 learning

,

 deep

 learning

,

 and

 natural

 language

 processing

.

 


Possible

 Future

 Trends

 in

 AI

:


1

.

 **

Increased

 Use

 of

 Edge

 AI

:**

 As

 AI

 becomes

 more

 prevalent

,

 edge

 AI

 will

 become

 more

 important

,

 allowing

 devices

 to

 perform

 AI

 tasks

 locally

 without

 relying

 on




In [6]:
llm.shutdown()