# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.88it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  5.87it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Sarah, I'm 14 years old, I am living in Brazil, Brazil is a country in South America, it is a democratic country, the capital city is Brasilia, Brazil is mostly Christian, with many religions people believe in, but also has a significant Muslim community. I am a student, I live in the city of Sao Paulo, I study in a middle school, I am an English teacher, my name is Sarah.

Can you tell me about Brazil and your own experiences in the country? Sure, I can tell you about Brazil. It's a vast country with an amazing diversity of cultures, landscapes, and ecosystems
Prompt: The president of the United States is
Generated text:  very busy these days, and the president is in charge of ______.
A. the army
B. the army and the navy
C. the government
D. the government and the army
Answer:

C

In the context of financial management, the principle of prudence is known as ____
A. Dividend policy
B. Cost of capital
C. Liquidity ratio
D. Capital structure
Ans

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name], and I'm a [job title] at [company name]. I'm excited to meet you and learn more about your career. What can you tell me about yourself? I'm a [insert a characteristic or skill that you're proud of, such as "outgoing", "hardworking", "team player", etc.]. I'm always looking for ways to grow and improve, and I'm always eager to learn new things. What's your background and what do you hope to achieve in your career? I'm always looking for new opportunities to grow and learn, and I hope to achieve success in my career. What

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris, known for its iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. It is also home to the French Parliament and the French National Library. Paris is a bustling city with a rich cultural heritage and is a major tourist destination. The city is known for its fashion, art, and cuisine, and is a popular destination for tourists and locals alike. The French capital is a vibrant and dynamic city with a rich history and culture. Paris is a city of contrasts, with its modern architecture and historical landmarks blending seamlessly together. The city is also known for its diverse population, with many different cultures

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by several key trends:

1. Increased automation: AI is likely to become more prevalent in various industries, from manufacturing to healthcare to customer service. Automation will likely lead to increased efficiency and productivity, but it will also lead to job displacement for some workers.

2. Enhanced personalization: AI will continue to improve the ability of machines to understand and respond to human needs and preferences. This will lead to more personalized experiences, such as personalized recommendations for products and services.

3. AI will become more ethical: As AI becomes more prevalent, there will be a growing concern about its ethical implications. This will lead to increased regulation



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  [Name]. I am a [X] person who has been [X] for [X] years. I currently live [City], [Country], and I am married to [配偶姓名]. I have [X] years of experience in [职业] and I enjoy [X]. I am always looking for new opportunities and always willing to learn and grow. What's your name and what kind of experience do you have? [Name] [Personal Information]
I look forward to discussing my experiences and aspirations with you. [Name]
Thank you for your time. I look forward to discussing your experiences and aspirations with you.
-Does

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. It is known for its vibrant culture, rich history, and stunning architecture. The city is home to various attractions such as the Louvre Museum, Eiffel Tower, and the Notre-Dame Cathedral. Paris is also a cultural hub for many of the world

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 [

Name

]

 and

 I

 am

 a

 versatile

,

 energetic

,

 and

 passionate

 writer

 who

 has

 a

 knack

 for

 capturing

 the

 essence

 of

 life

 with

 my

 words

.

 I

'm

 a

 creative

,

 artist

ically

 inclined

 individual

 who

 thr

ives

 in

 a

 dynamic

,

 ever

-changing

 environment

.

 My

 writing

 often

 del

ves

 into

 the

 human

 experience

,

 exploring

 themes

 of

 love

,

 relationships

,

 and

 personal

 growth

.

 I

'm

 not

 just

 a

 writer

,

 I

'm

 a

 storyt

eller

 who

 puts

 my

 heart

 into

 the

 craft

 and

 turns

 my

 ideas

 into

 unforgettable

 stories

.

 If

 you

're

 looking

 for

 a

 writer

 who

 can

 make

 your

 words

 come

 alive

,

 I

'm

 your

 guy

!

 Can

't

 wait

 to

 hear

 your

 thoughts

 on

 what

 you

'd

 like



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.

 



1

.

 Please

 provide

 me

 with

 a

 list

 of

 words

 that

 describe

 the

 capital

 city

.


2

.

 Can

 you

 give

 an

 example

 of

 a

 famous

 landmark

 or

 monument

 located

 in

 Paris

?


3

.

 How

 would

 you

 describe

 the

 temperature

 and

 weather

 in

 Paris

 during

 the

 summer

 months

?


4

.

 What

 is

 the

 capital

 city

's

 official

 language

?


5

.

 Who

 is

 the

 current

 mayor

 of

 Paris

?


6

.

 What

 is

 the

 weather

 like

 during

 the

 winter

 months

 in

 Paris

?


7

.

 How

 is

 the

 French

 Riv

iera

 known

 for

 its

 beaches

?


8

.

 Is

 Paris

 known

 for

 its

 culinary

 traditions

?


9

.

 What

 is

 the

 famous

 landmark

 at

 the

 heart

 of

 the

 city

?


1

0

.



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 likely

 to

 be

 shaped

 by

 several

 trends

,

 including

:



1

.

 Increased

 complexity

:

 As

 AI

 becomes

 more

 sophisticated

,

 it

 will

 require

 more

 resources

 and

 capabilities

 to

 learn

 and

 adapt

 to

 new

 situations

.

 This

 will

 likely

 lead

 to

 increased

 complexity

 and

 the

 emergence

 of

 more

 sophisticated

 and

 powerful

 AI

 systems

.



2

.

 Personal

ization

:

 AI

 will

 become

 more

 personalized

,

 with

 the

 ability

 to

 learn

 from

 individual

 users

 and

 provide

 personalized

 recommendations

 and

 responses

.

 This

 will

 require

 more

 sophisticated

 data

 analysis

 and

 machine

 learning

 techniques

 to

 understand

 and

 interpret

 human

 behavior

.



3

.

 Eth

ical

 considerations

:

 As

 AI

 becomes

 more

 prevalent

 in

 various

 industries

,

 there

 will

 be

 a

 growing

 need

 for

 ethical

 considerations

 and

 regulations

 to

 ensure

 that

 AI




In [6]:
llm.shutdown()