# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.31it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.25it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.24it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.71it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.51it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Matthew. I am a
31-year-old software engineer, and I'm here today to talk to you about
something that's been on my mind a lot lately. I've been in the software industry
for over a decade now, and I've seen a lot of things change over the years.
The way we develop software, the tools we use, the languages we code in,
the methodologies we follow – everything is constantly evolving.
However, one thing that I've noticed that's changed a lot over the past few
years is the way we think about and approach the concept of "done."
In the past, when we said
Prompt: The president of the United States is
Generated text:  not a monarch, but an elected representative of the people. (Image by Freepik)
The U.S. Constitution is clear: The president is not above the law
By Jennifer E. Daskal, Professor of Law, Georgetown University Law Center
This article was originally published on The Conversation and is republished here with permission.
As the 2020 presidenti

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  Kaida. I'm a 25-year-old freelance writer and editor living in a small town in the Pacific Northwest. I enjoy hiking, reading, and trying out new coffee shops. I'm a bit of a introvert, but I'm always up for a good conversation.
This self-introduction is neutral because it doesn't reveal any personal opinions or biases. It simply states the character's name, age, occupation, and interests. It also gives a sense of the character's personality, but in a subtle way. For example, the fact that Kaida enjoys hiking and reading suggests that they are someone who values nature and learning,

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris. Paris is located in the northern part of the country and is situated on the Seine River. It is the largest city in France and is known for its rich history, art, fashion, and culture. Paris is home to many famous landmarks such as the Eiffel Tower, the Louvre Museum, and Notre Dame Cathedral. The city has a population of over 2.1 million people and is a major hub for international business, finance, and tourism. Paris is also known for its romantic atmosphere and is often referred to as the City of Light. The city has a long history dating back to the 3rd century

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  expected to be shaped by various factors, including technological advancements, societal needs, and ethical considerations. Here are some possible future trends in artificial intelligence:
1. Increased use of AI in healthcare: AI is expected to play a larger role in healthcare, including diagnosis, treatment, and patient care. AI-powered systems will be able to analyze large amounts of medical data, identify patterns, and make predictions about patient outcomes.
2. Widespread adoption of AI in industries: AI is expected to be adopted in various industries, including finance, transportation, and education. AI-powered systems will be able to automate tasks, improve efficiency, and enhance decision-making



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Lena Murphy, and I'm a 28-year-old freelance writer and editor living in Chicago. I have a degree in English from the University of Illinois and have worked on various publications and websites. I enjoy writing about social justice and cultural topics, as well as trying out new restaurants and breweries in the city. When I'm not working, you can find me volunteering at a local community garden or practicing yoga. I'm a bit of a coffee snob and always up for a good conversation. That's me in a nutshell. I'd love to get to know you better. Lena Murphy
Lena Murphy is a 28-year-old

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. Paris is a city located in the northern part of the country and is known for its iconic landmarks, including the Eiffel Tower and the Louvre Museum.
Next, provide a statement that 

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 Eli

an

ore

 Qu

asar

.

 I

'm

 a

 

21

-year

-old

 astronom

er

-in

-training

 at

 the

 Cele

stial

 Observatory

.

 I

'm

 currently

 pursuing

 my

 master

's

 degree

,

 focusing

 on

 gal

actic

 evolution

 and

 planetary

 formation

.

 I

'm

 quite

 interested

 in

 the

 mysteries

 of

 the

 cosmos

 and

 the

 role

 of

 dark

 matter

 within

 our

 universe

.

 When

 I

'm

 not

 studying

 or

 st

arg

azing

,

 I

 enjoy

 reading

 classic

 literature

 and

 practicing

 yoga

.

 I

'm

 a

 bit

 of

 a

 intro

vert

,

 but

 I

 appreciate

 meeting

 new

 people

 who

 share

 my

 passions

.

 That

's

 me

 in

 a

 nutshell

.

 By

 the

 way

,

 do

 you

 have

 any

 interesting

 stories

 or

 knowledge

 about

 the

 cosmos

?


This

 self

-int

roduction

 has



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

.


Provide

 a

 concise

 factual

 statement

 about

 France

’s

 capital

 city

.

 The

 capital

 of

 France

 is

 Paris

.

 The

 city

 is

 often

 called

 the

 most

 beautiful

 in

 the

 world

,

 due

 to

 its

 beauty

,

 architecture

 and rich

 history.

 It is

 a

 very

 popular

 tourist

 destination

,

 attracting

 millions

 of

 visitors

 every

 year

.

 Paris

 is

 the

 second

 most

 visited

 city

 in

 the

 world

,

 after

 Bangkok

,

 according

 to

 a

 survey

 in

 

200

7

.

 The

 city

 is

 home

 to

 many

 famous

 landmarks

,

 including

 the

 E

iff

el

 Tower

,

 Notre

 Dame

 Cathedral

 and

 the

 Lou

vre

 Museum

,

 which

 houses

 the

 Mona

 Lisa

.

 The

 city

 is

 also

 famous

 for

 its

 fashion

 and

 cuisine

,

 with

 many

 high

-end

 fashion

 designers



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 likely

 to

 be

 shaped

 by

 the

 intersection

 of

 several

 factors

,

 including

 technological

 advancements

,

 societal

 needs

,

 and

 regulatory

 frameworks

.


The

 future

 of

 artificial

 intelligence

 (

AI

)

 is

 likely

 to

 be

 shaped

 by

 the

 intersection

 of

 several

 factors

,

 including

 technological

 advancements

,

 societal

 needs

,

 and

 regulatory

 frameworks

.

 Here

 are

 some

 possible

 future

 trends

 in

 AI

:



1

.

 

 **

Increased

 Adoption

 in

 Various

 Industries

:**

 AI

 is

 expected

 to

 become

 ubiquitous

 across

 various

 industries

,

 including

 healthcare

,

 finance

,

 education

,

 and

 transportation

.

 This

 will

 lead

 to

 increased

 efficiency

,

 productivity

,

 and

 innovation

 in

 these

 sectors

.



2

.

 

 **

Enh

anced

 Aut

onomy

:**

 AI

 systems

 are

 likely

 to

 become

 more

 autonomous

,

 capable

 of

 making




In [6]:
llm.shutdown()