# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-29 17:48:05 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.22it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.12it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.11it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.51it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.35it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Emma and I’m a junior at Linfield College, majoring in English with a minor in Psychology. I’m originally from the Pacific Northwest and I love hiking, reading, and writing. I’m excited to be working with the Lindblad Expeditions-National Geographic team as a blogging intern and I hope to share my experiences with you from the Galapagos Islands.
As I prepare to embark on this journey, I’m filled with excitement and a little bit of nervousness. I’ve always been fascinated by the unique wildlife and ecosystems of the Galapagos and I’m eager to learn more about the islands and their incredible biodiversity.
Prompt: The president of the United States is
Generated text:  scheduled to visit South Korea today to discuss the ongoing conflict with North Korea and the state of US-Korea relations. In anticipation of the visit, President Trump tweeted that the US is ready to discuss a "peaceful solution" with North Korea, but also emphasized that the US w

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Nicole

,

 and

 I

 have

 been

 a

 hairst

y

list

 for

 over

 

10

 years

.

 I

 specialize

 in

 cutting

 and

 coloring

 hair

.

 I

 love

 helping

 my

 clients

 achieve

 their

 desired

 look

 and

 making

 them

 feel

 confident

 and

 beautiful

.

 I

 am

 always

 continuing

 my

 education

 to

 stay

 up

 to

 date

 on

 the

 latest

 techniques

 and

 trends

.

 I

 am

 also

 a

 mom

 of

 two

 and

 love

 spending

 time

 with

 my

 family

.

 In

 my

 free

 time

,

 I

 enjoy

 reading

,

 hiking

 and

 trying

 out

 new

 restaurants

.


I

 am

 a

 master

 stylist

 with

 a

 passion

 for

 creating

 unique

 and

 personalized

 looks

 for

 each

 of

 my

 clients

.

 I

 have

 a

 wide

 range

 of

 experience

 with

 various

 hair

 types

 and

 textures

 and

 am

 always

 eager

 to



Prompt: The capital of France is
Generated text: 

 a

 city

 that

 needs

 no

 introduction

.

 The

 City

 of

 Light

,

 the

 most

 visited

 city

 in

 the

 world

,

 the

 E

iff

el

 Tower

,

 Notre

 Dame

,

 Lou

vre

,

 Mont

mart

re

...

 the

 list

 goes

 on

 and

 on

.

 It

's

 a

 city

 that

's

 steep

ed

 in

 history

,

 art

,

 culture

,

 and

 romance

.

 In

 this

 article

,

 we

'll

 explore

 the

 top

 

10

 things

 to

 do

 in

 Paris

,

 from

 iconic

 landmarks

 to

 hidden

 gems

.


1

.

 The

 E

iff

el

 Tower

 (

Tour

 E

iff

el

)


No

 trip

 to

 Paris

 is

 complete

 without

 visiting

 the

 E

iff

el

 Tower

,

 the

 iconic

 iron

 lattice

 tower

 that

's

 become

 synonymous

 with

 the

 city

.

 Take

 the

 elevator



Prompt: The future of AI is
Generated text: 

 in

 the

 hands

 of

 creat

ives




Art

ificial

 intelligence

 (

AI

)

 has

 the

 potential

 to

 revolution

ize

 various

 industries

,

 from

 healthcare

 and

 finance

 to

 education

 and

 entertainment

.

 However

,

 the

 development

 of

 AI

 is

 largely

 dependent

 on

 the

 creativity

 of

 humans

.

 Here

 are

 some

 reasons

 why

 creat

ives

 are

 instrumental

 in

 shaping

 the

 future

 of

 AI

:


1

.

 AI

 requires

 human

 input

 for

 training

 data

:

 To

 develop

 AI

 models

,

 large

 datasets

 are

 required

 to

 train

 the

 algorithms

.

 Creat

ives

,

 such

 as

 writers

,

 designers

,

 and

 musicians

,

 can

 help

 generate

 the

 diverse

 and

 rich

 data

 needed

 to

 train

 AI

 models

.


2

.

 Creat

ives

 can

 develop

 AI

-generated

 content

:

 AI

 algorithms

 can

 be

 used

 to




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Michael and I'm a passionate about traveling and photography. I'm from the United States, but I've had the opportunity to travel to many countries and experience different cultures. My favorite type of photography is landscape and cityscape photography, but I also enjoy taking portraits and street photography. I'm always looking for new and interesting subjects to photograph, and I love the challenge of capturing the perfect shot.
What kind of photography do you enjoy most?
I enjoy a variety of photography styles, but my favorite type of photography is landscape and cityscape photography. There's something about capturing the beauty of a vast landscape or the energy of a bustling city that

Prompt: The capital of France is
Generated text:  home to some of the world's most famous art museums, including the Louvre and Orsay. This program allows you to explore the works of the masters, from the Renaissance to the Impressionists, in a world-class

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Anna

 and

 I

 am

 a

 

32

-year

-old

 woman

 from

 the

 UK

.

 I

 was

 diagnosed

 with

 a

 rare

 condition

 called

 Neuro

end

ocrine

 T

um

our

 (

NET

)

 in

 

201

7

.

 I

 am

 writing

 this

 blog

 to

 share

 my

 experiences

 and

 raise

 awareness

 about

 this

 often

-m

is

under

stood

 condition

.



NET

 is

 a

 type

 of

 cancer

 that

 forms

 in

 the

 neuro

end

ocrine

 system

,

 which

 includes

 the

 adrenal

 glands

,

 pancre

as

,

 thyroid

,

 and

 other

 hormone

-producing

 tissues

 in

 the

 body

.

 Symptoms

 can

 vary

 widely

 depending

 on

 the

 location

 and

 size

 of

 the

 tum

our

,

 but

 common

 symptoms

 include

 flushing

,

 sweating

,

 diarrhea

,

 and

 abdominal

 pain

.



My

 journey

 with

 NET

 began

 in

 



Prompt: The capital of France is
Generated text: 

 Paris

.

 Paris

 is

 a

 city

 in

 the

 Î

le

-de

-F

rance

 region

 of

 France

.

 It

 is

 one

 of

 the

 largest

 cities

 in

 Europe

,

 and

 it

 has

 been

 the

 capital

 of

 France

 since

 

987

.


Paris

 is

 known

 as

 the

 "

City

 of

 Light

"

 due

 to

 its

 history

 as

 a

 center

 of

 learning

 and

 culture

 in

 the

 

17

th

 century

,

 when

 many

 famous

 intellectuals

 lived

 there

.

 It

 is

 also

 famous

 for

 its

 art

 and

 fashion

,

 including

 the

 Lou

vre

 Museum

 and

 the

 E

iff

el

 Tower

,

 which

 was

 built

 in

 

188

9

 for

 the

 World

's

 Fair

.

 Paris

 is

 also

 known

 for

 its

 romantic

 atmosphere

,

 with

 its

 beautiful

 rivers

,

 parks

,

 and

 gardens

,



Prompt: The future of AI is
Generated text: 

 here

,

 and

 it

’s

 all

 about

 collaboration




The

 future

 of

 Artificial

 Intelligence

 (

AI

)

 is

 no

 longer

 about

 a

 single

,

 all

-power

ful

 AI

 system

,

 but

 rather

 about

 a

 diverse

 array

 of

 AI

 models

 working

 together

 to

 achieve

 common

 goals

.


As

 AI

 continues

 to

 evolve

,

 we

're

 witnessing

 a

 shift

 from

 the

 traditional

 notion

 of

 a

 single

,

 mon

olithic

 AI

 system

 to

 a

 collaborative

 AI

 ecosystem

.

 This

 is

 often

 referred

 to

 as

 “

hy

brid

 AI

”

 or

 “

multi

-agent

 AI

.”


The

 benefits

 of

 collaborative

 AI

 are

 numerous

:


Improved

 problem

-solving

:

 By

 combining

 the

 strengths

 of

 multiple

 AI

 models

,

 organizations

 can

 tackle

 complex

 challenges

 that

 might

 be

 too

 difficult

 for

 a

 single

 AI

 system




In [6]:
llm.shutdown()