# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 03-26 00:42:34 __init__.py:190] Automatically detected platform cuda.


INFO 03-26 00:42:57 __init__.py:190] Automatically detected platform cuda.


INFO 03-26 00:42:59 __init__.py:190] Automatically detected platform cuda.


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.05it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.65it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.32it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.21it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.25it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Chris Rizzo. I'm a 25 year old software developer with a passion for building applications and solving real world problems. I'm a 2018 graduate of the University of Delaware, where I earned a Bachelor of Science in Computer Science.
My experience spans a variety of languages and technologies, including Java, Python, JavaScript, and C++. I've worked on a range of projects, from simple scripts and tools to full-fledged web applications. I'm confident in my ability to learn and adapt to new technologies and frameworks, and I'm always eager to take on new challenges.
When I'm not coding, you can find me hiking
Prompt: The president of the United States is
Generated text:  also the commander-in-chief of the armed forces. However, the president is not a member of the military. The president is responsible for determining the size, structure, and organization of the military. The president also has the authority to use military force to protect natio

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 25-year-old freelance writer and artist living in a small town in the Pacific Northwest. I enjoy hiking, reading, and trying out new recipes in my tiny kitchen. I'm a bit of a introvert, but I love meeting new people and hearing their stories. I'm currently working on a novel and a collection of short stories, and I'm excited to see where my creative projects take me. I'm looking forward to connecting with like-minded individuals and learning from their experiences.
This self-introduction is neutral because it doesn't reveal too much about Kaida's personality, background, or motivations. It

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris.
The Eiffel Tower is a famous landmark in Paris. The Eiffel Tower is a famous landmark in Paris.
The Louvre Museum is a famous museum in Paris. The Louvre Museum is a famous museum in Paris.
The Seine River runs through the heart of Paris. The Seine River runs through the heart of Paris.
The city of Paris is known for its fashion and cuisine. The city of Paris is known for its fashion and cuisine.
The city of Paris is a popular tourist destination. The city of Paris is a popular tourist destination.
The city of Paris is a major cultural and economic center. The city

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  expected to be shaped by several factors, including advancements in machine learning, natural language processing, and computer vision. Here are some possible future trends in AI:
1. Increased use of AI in healthcare: AI is expected to play a larger role in healthcare, from diagnosing diseases to developing personalized treatment plans. AI-powered chatbots and virtual assistants may become more common in healthcare settings, helping patients navigate the healthcare system and providing support for patients with chronic conditions.
2. Widespread adoption of AI in education: AI is expected to transform the education sector, from personalized learning to automated grading. AI-powered adaptive learning systems may become more prevalent,



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Jack Harris. I work as a freelance writer and spend most of my time working on my novels and short stories. I have a passion for writing and enjoy expressing my thoughts and ideas through my writing. I am not one to seek the spotlight, preferring to keep a low profile and let my work speak for itself. I find joy in the quiet moments, lost in the world of my imagination, and I am content with that. That is who I am. In this self-introduction, I have used the following neutral language: * I have avoided using overly positive or negative adjectives to describe myself. * I have used simple and straightforward

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris.
Use the following sentence as a starting point and revise it to include a specific detail about the city. The city of Paris is the capital of France and

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 K

aida

 K

ats

ur

agi

,

 and

 I

'm

 a

 

17

-year

-old

 high

 school

 student

.

 I

'm

 a

 bit

 of

 a

 book

worm

 and

 like

 to

 spend

 my

 free

 time

 reading

 and

 writing

.

 I

 enjoy

 learning

 about

 different

 cultures

 and

 trying

 new

 foods

.

 I

'm

 not

 really

 sure

 what

 I

 want

 to

 do

 with

 my

 life

 yet

,

 but

 I

'm

 hoping

 to

 figure

 that

 out

 soon

.



##

 Step

 

1

:

 Identify

 the

 key

 elements

 of

 a

 neutral

 self

-int

roduction

.


A

 neutral

 self

-int

roduction

 should

 include

 basic

 personal

 information

,

 such

 as

 name

 and

 age

,

 without

 revealing

 too

 much

 about

 personal

 preferences

 or

 opinions

.

 It

 should

 also

 mention

 any

 relevant

 hobbies

 or

 interests

 but

 avoid



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.


https

://

www

.history

.com

/topics

/fr

ance

/

par

is




Paris

 is

 the

 capital

 city

 of

 France

.

 It

 is

 situated

 in

 the

 northern

 part

 of

 the

 country

 and

 is

 located

 on

 the

 Se

ine

 River

.

 Paris

 is

 known

 for

 its

 rich

 history

,

 cultural

 landmarks

,

 and

 artistic

 achievements

.

 The

 city

 has

 been

 a

 major

 center

 of

 learning

 and

 intellectual

 inquiry

 since

 the

 Middle

 Ages

 and

 has

 been

 the

 hub

 of

 the

 French

 Revolution

.


Some

 of

 the

 most

 famous

 landmarks

 in

 Paris

 include

 the

 E

iff

el

 Tower

,

 Notre

 Dame

 Cathedral

,

 the

 Lou

vre

 Museum

,

 and

 the

 Arc

 de

 Tri

omp

he

.

 The

 city

 is

 also

 home

 to

 many

 beautiful

 parks

 and

 gardens

,

 such

 as



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 vast

 and

 holds

 many

 possibilities

,

 but

 also

 potential

 risks

.

 What

 could

 be

 the

 benefits

 and

 drawbacks

 of

 emerging

 technologies

 like

 deep

 learning

,

 robotics

,

 and

 autonomous

 vehicles

?


As

 we

 look

 to

 the

 future

,

 artificial

 intelligence

 (

AI

)

 is

 poised

 to

 transform

 numerous

 aspects

 of

 our

 lives

,

 from

 healthcare

 and

 transportation

 to

 finance

 and

 education

.

 Emerging

 technologies

 like

 deep

 learning

,

 robotics

,

 and

 autonomous

 vehicles

 are

 likely

 to

 drive

 significant

 advancements

 in

 AI

.

 However

,

 these

 technologies

 also

 raise

 concerns

 about

 job

 displacement

,

 bias

,

 and

 accountability

.

 Here

 are

 some

 potential

 future

 trends

 in

 AI

 and

 their

 associated

 benefits

 and

 drawbacks

:


Benefits

 and

 Draw

backs

 of

 Emerging

 AI

 Technologies

:


1

.

 **

Deep

 Learning

:




In [6]:
llm.shutdown()