# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-30 08:56:14 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.15it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.08it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.09it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.47it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.31it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Pete, and I am a recovering shopaholic.
I know what you’re thinking, “Shopaholic? That’s not a thing!” But let me tell you, I was a shopaholic in every sense of the word. I would spend hours browsing through malls, scrolling through online shopping sites, and dreaming up justifications for buying things I didn’t need. My wallet, my credit score, and my relationship with my significant other all suffered as a result.
But in 2019, I hit rock bottom. I maxed out my credit cards, had to take out a loan to pay off my debts, and was
Prompt: The president of the United States is
Generated text:  responsible for many things, but his main job is to enforce the laws of the country. He is supposed to uphold the Constitution and protect the country from threats both foreign and domestic. However, sometimes the president can overstep his authority and become more of a dictator than a leader. This is what has been happening with President Trump.
President T

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 D

anya

 and

 I

 am

 a

 second

-year

 student

 at

 the

 University

 of

 Ottawa

.

 I

 am

 a

 student

 in

 the

 Faculty

 of

 Arts

,

 major

ing

 in

 Anthrop

ology

 and

 min

oring

 in

 Indigenous

 Studies

.

 My

 research

 interests

 include

 the

 role

 of

 traditional

 knowledge

 in

 contemporary

 societies

,

 Indigenous

-set

t

ler

 relations

,

 and

 the

 representation

 of

 Indigenous

 peoples

 in

 media

.


I

 am

 the

 President

 of

 the

 University

 of

 Ottawa

’s

 Indigenous

 Students

 Association

 (

ISA

),

 and

 I

 work

 closely

 with

 the

 Car

leton

 University

’s

 Indigenous

 Students

 Association

 to

 support

 Indigenous

 students

 across

 the

 university

 system

.

 My

 goal

 is

 to

 provide

 a

 platform

 for

 Indigenous

 students

 to

 have

 their

 voices

 heard

 and

 to

 advocate

 for

 their

 rights

 and

 interests

.


Outside



Prompt: The capital of France is
Generated text: 

 Paris

,

 the

 City

 of

 Light

,

 a

 world

-ren

owned

 center

 of

 fashion

,

 art

,

 and

 culture

.

 Our

 team

 is

 well

-

versed

 in

 the

 nuances

 of

 French

 property

 law

,

 which

 can

 be

 very

 different

 from

 other

 jurisdictions

.

 We

 provide

 a

 wide

 range

 of

 services

 to

 individuals

 and

 companies

 looking

 to

 acquire

,

 develop

,

 or

 dispose

 of

 real

 estate

 in

 France

.


Our

 services

 for

 French

 property

 include

:


Ad

vis

ing

 on

 French

 real

 estate

 law

 and

 regulations

,

 including

 the

 purchase

 and

 sale

 of

 property

,

 rentals

,

 and

 mortgages




Ass

isting

 with

 the

 acquisition

 of

 property

 through

 French

 corporate

 structures

 or

 trusts




Guid

ing

 clients

 through

 the

 French

 registration

 process

,

 including

 the

 registration

 of

 property

 ownership



Prompt: The future of AI is
Generated text: 

 built

 on

 the

 shoulders

 of

 giants

.

 This

 is

 especially

 true

 when

 it

 comes

 to

 computer

 vision

,

 where

 researchers

 have

 been

 pouring

 their

 knowledge

 and

 expertise

 into

 developing

 the

 foundational

 models

 and

 techniques

 that

 power

 today

's

 AI

 systems

.


One

 of

 the

 pioneers

 in

 the

 field

 of

 computer

 vision

 is

 Andrew

 Z

isser

man

,

 a

 British

 computer

 scientist

 and

 professor

 at

 the

 University

 of

 Oxford

.

 In

 the

 

199

0

s

 and

 early

 

200

0

s

,

 Z

isser

man

,

 along

 with

 his

 colleagues

,

 developed

 the

 theory

 of

 the

 "

interest

 point

 detectors

"

 that

 would

 later

 become

 the

 foundation

 of

 many

 modern

 computer

 vision

 algorithms

.


Another

 influential

 figure

 in

 the

 field

 is

 Piet

ro

 Per

ona

,

 an

 Italian




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Makoto and I am the 8th generation of a Japanese family.
As a young child, I was fascinated by the beauty of traditional Japanese clothing called KIMONO, which my mother and grandmother wore on special occasions. I loved watching them carefully select fabrics, carefully fold, tie, and adjust the obi to create a harmonious balance of colors, patterns, and textures.
As I grew older, my interest in KIMONO only deepened, and I began to study the history, craftsmanship, and cultural significance of these beautiful garments. I learned about the different types of KIMONO, such as Kimono, Yukata

Prompt: The capital of France is
Generated text:  Paris.
The capital of France is Paris.
The capital of France is Paris.
The capital of France is Paris.
The capital of France is Paris.
The capital of France is Paris.
The capital of France is Paris.
The capital of France is Paris.
The capital of France is Paris.
The capital of France is Paris.
The capital of 

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Eric

 L

ui




I

 am

 a

 

20

-year

-old

 international

 student

 from

 the

 Philippines

 studying

 abroad

 in

 the

 US

.

 I

 am

 currently

 in

 my

 second

 year

 of

 undergraduate

 studies

,

 pursuing

 a

 degree

 in

 Marketing

.

 I

 am

 very

 excited

 to

 be

 a

 part

 of

 this

 community

 and

 I

 look

 forward

 to

 making

 new

 friends

 and

 connections

.


As

 a

 marketing

 student

,

 I

 am

 interested

 in

 exploring

 the

 different

 facets

 of

 the

 field

,

 including

 brand

 management

,

 digital

 marketing

,

 and

 consumer

 behavior

.

 I

 am

 also

 passionate

 about

 social

 media

 and

 its

 impact

 on

 society

.

 In

 my

 free

 time

,

 I

 enjoy

 playing

 basketball

,

 watching

 movies

,

 and

 trying

 out

 new

 foods

.


I

 am

 looking

 forward

 to

 engaging

 with



Prompt: The capital of France is
Generated text: 

 a

 city

 of

 grand

eur

 and

 beauty

,

 with

 famous

 landmarks

 like

 the

 E

iff

el

 Tower

,

 Notre

 Dame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 However

,

 the

 city

 also

 has

 a

 rich

 and

 complex

 history

,

 with

 periods

 of

 revolution

,

 war

,

 and

 occupation

.

 Here

 are

 some

 of

 the

 most

 interesting

 facts

 about

 Paris

:


1

.

 Paris

 is

 the

 most

 visited

 city

 in

 the

 world

,

 with

 over

 

23

 million

 tourists

 per

 year

.


2

.

 The

 city

 has

 a

 long

 and

 complex

 history

,

 with

 archaeological

 evidence

 showing

 human

 hab

itation

 dating

 back

 to

 the

 

3

rd

 millennium

 BC

.


3

.

 The

 famous

 E

iff

el

 Tower

 was

 originally

 intended

 to

 be

 a

 temporary

 structure

,

 built



Prompt: The future of AI is
Generated text: 

 being

 developed

 and

 refined

 by

 tech

 giants

,

 startups

,

 and

 research

 institutions

 worldwide

.

 Here

 are

 the

 top

 

10

 AI

 trends

 that

 will

 shape

 the

 future

 of

 AI

:


1

.

 Explain

ability

 and

 Transparency

:


As

 AI

 becomes

 increasingly

 ubiquitous

,

 the

 need

 for

 explain

ability

 and

 transparency

 in

 AI

 decision

-making

 processes

 will

 grow

.

 This

 trend

 involves

 developing

 AI

 systems

 that

 can

 provide

 clear

 explanations

 for

 their

 actions

,

 making

 it

 easier

 for

 humans

 to

 understand

 and

 trust

 AI

-driven

 decisions

.


2

.

 Edge

 AI

:


The

 proliferation

 of

 IoT

 devices

 has

 led

 to

 a

 growing

 need

 for

 AI

 to

 be

 processed

 at

 the

 edge

,

 closer

 to

 the

 source

 of

 the

 data

.

 Edge

 AI

 involves

 deploying

 AI

 models

 on

 devices




In [6]:
llm.shutdown()