# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.52it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.51it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  James and I am a computer science student at the University of the People in the Philippines. My major is Computer Science and I am currently pursuing my Bachelor of Science in Computer Science and Information Technology. My academic career is in the undergraduate level and I am doing my research in the area of Bioinformatics. Here are a few of my favorites.

  1. I have been involved in a research project on the cDNA assembly and annotation. My project was to assemble the entire CDS of all the hGVSc sequences from the Human Genome Project into a single genome assembly. This project was carried out on the University of the Philippines at the
Prompt: The president of the United States is
Generated text:  a political office. He is the leader of the country. He makes decisions for the country and is in charge of a lot of important things. So, he is very important. He is an important person. What's the most important person in the world? The presi

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name] and I'm a [job title] at [company name]. I'm excited to meet you and learn more about your career. What can you tell me about yourself? I'm a [insert a short description of your personality or skills that you would like to share]. I'm always looking for new opportunities to grow and learn, and I'm eager to contribute to [company name] and help make a positive impact. Thank you for taking the time to meet me. [Name] [Company Name] [Date] [Name] [Company Name] [Date] [Name] [Company Name] [Date] [

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris, the city known for its iconic landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. It is also a major cultural and economic center, hosting numerous museums, theaters, and festivals throughout the year. Paris is a popular tourist destination, known for its rich history, beautiful architecture, and vibrant nightlife. The city is also home to many international organizations and institutions, including the European Parliament and the United Nations. Paris is a city of contrasts, with its modern architecture and historical landmarks blending seamlessly into the urban landscape. The city is also known for its diverse cuisine, including French cuisine and international dishes

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by rapid advancements in several key areas, including:

1. Increased integration with human intelligence: AI is likely to become more integrated with human intelligence, allowing for more complex and nuanced interactions between machines and humans. This could lead to more sophisticated forms of artificial intelligence, such as those that can understand and adapt to human emotions and behaviors.

2. Enhanced machine learning capabilities: AI is likely to become even more powerful and capable, with the ability to learn from vast amounts of data and make more accurate predictions and decisions. This could lead to more efficient and effective use of resources, as well as more personalized and targeted services.

3



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Name], and I am a skilled professional in the field of [Field]. I have been in this industry for [number of years] years, and I have a passion for [why you like your job]. My goal is to ensure that every project I work on is completed to the highest standard and that I am providing the best possible service to my clients. I am confident in my abilities to handle any challenge that comes my way and I am always looking for opportunities to learn and grow within the industry. Thank you for considering me for a role. [Name] [Company Name] [Company Address] [Company Website] [Company

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris, the city with the Eiffel Tower, Notre Dame Cathedral, and the Louvre Museum. 

The largest city in France is not Paris, but Lyon. 

Lyon is the largest city in France by area, at

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 [

name

],

 and

 I

 am

 a

 [

character

 type

]

 at

 [

character

 agency

].

 I

 am

 passionate

 about

 [

mention

 a

 specific

 hobby

,

 interest

,

 or

 area

 of

 expertise

].

 My

 goal

 is

 to

 help

 people

 be

 more

 [

describe

 how

 I

 can

 help

 them

 in

 the

 chosen

 area

].

 I

 am

 always

 looking

 for

 new

 opportunities

 to

 grow

 my

 skills

 and

 meet

 new

 people

.

 What

 can

 you

 tell

 me

 about

 your

 background

 and

 how

 it

 has

 shaped

 your

 character

?

 [

Provide

 a

 brief

 personal

 background

 and

 a

 career

 highlight

 to

 start

 the

 introduction

.

]


[

Your

 introduction

]


Your

 introduction

 should

 be

 concise

,

 to

 the

 point

,

 and

 reflective

 of

 your

 character

.

 Use

 the

 information

 you

 have

 gathered

 from



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

,

 also

 known

 as

 "

La

 France

"

 or

 "

Le

 Mans

".

 



This

 is

 a

 widely

 known

 fact

.

 What

 do

 you

 think

?

 Would

 you

 like

 to

 add

 anything

 else

 to

 make

 it

 even more

 accurate

?

 Let

 me

 know

 if

 you

 have

 any

 other

 questions

.

 Paris

 is

 the

 capital

 of

 France

,

 and

 it

's

 the

 city

 where

 most

 people

 live

.

 France

 is

 a

 country

 in

 western

 Europe

,

 and Paris

 is

 the

 capital

 of

 that

 country

.

 It

's

 the

 largest

 city

 in

 France

 by

 population

,

 and

 it

's

 also

 the

 center

 of

 many important

 French

 institutions

 and

 attractions

.

 Paris

 is

 a

 world

-ren

owned

 center

 for

 art

, culture

,

 and

 fashion

,

 and

 it

's

 home



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 expected

 to

 be

 a

 rapidly

 evolving

 and

 rapidly

 changing

 field

,

 with

 many

 different

 trends

 and

 possibilities

 emerging

.



One

 of

 the

 primary

 trends

 in

 AI

 is

 the

 increasing

 integration

 of

 artificial

 intelligence

 into

 everyday

 life

.

 This

 is

 likely

 to

 lead

 to

 a

 more

 seamless

 integration

 of

 AI

 into

 consumer

 products

 and

 services

,

 such

 as

 smartphones

,

 appliances

,

 and

 vehicles

.

 AI will

 also

 be

 increasingly

 integrated

 into

 healthcare

,

 education

,

 and

 finance

,

 driving

 improvements

 in

 patient

 care

,

 academic

 outcomes

,

 and

 financial

 decision

-making

.



Another

 trend

 is

 the

 increasing

 use

 of

 AI

 in

 developing

 new

 technologies

 and

 products

.

 AI

 will

 likely

 become

 an

 increasingly

 important

 component

 of

 the

 development

 of

 new

 technologies

,

 such

 as

 virtual

 reality

,

 augmented




In [6]:
llm.shutdown()