# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  4.74it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.50it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.30it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.20it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.32it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:00<00:20,  1.05it/s]  9%|▊         | 2/23 [00:01<00:10,  2.00it/s]

 13%|█▎        | 3/23 [00:01<00:06,  2.86it/s] 17%|█▋        | 4/23 [00:01<00:05,  3.57it/s]

 22%|██▏       | 5/23 [00:01<00:04,  4.14it/s] 26%|██▌       | 6/23 [00:01<00:03,  4.44it/s]

 30%|███       | 7/23 [00:02<00:03,  4.80it/s] 35%|███▍      | 8/23 [00:02<00:02,  5.08it/s]

 39%|███▉      | 9/23 [00:02<00:02,  5.26it/s] 43%|████▎     | 10/23 [00:02<00:02,  5.41it/s]

 48%|████▊     | 11/23 [00:02<00:02,  5.48it/s] 52%|█████▏    | 12/23 [00:02<00:01,  5.56it/s]

 57%|█████▋    | 13/23 [00:03<00:01,  5.61it/s] 61%|██████    | 14/23 [00:03<00:01,  5.50it/s]

 65%|██████▌   | 15/23 [00:03<00:01,  5.32it/s]

 70%|██████▉   | 16/23 [00:03<00:01,  4.52it/s]

 74%|███████▍  | 17/23 [00:04<00:01,  4.33it/s]

 78%|███████▊  | 18/23 [00:04<00:01,  4.15it/s]

 83%|████████▎ | 19/23 [00:04<00:00,  4.14it/s]

 87%|████████▋ | 20/23 [00:04<00:00,  4.30it/s]

 91%|█████████▏| 21/23 [00:05<00:00,  3.70it/s]

 96%|█████████▌| 22/23 [00:05<00:00,  3.38it/s]

100%|██████████| 23/23 [00:05<00:00,  3.41it/s]100%|██████████| 23/23 [00:05<00:00,  4.01it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Shawn Perry. I am a husband, father, and passionate about photography. I have a deep love for the ocean and the beauty it holds. I am also a fan of the outdoors, especially when it's paired with adventure and good company. My photography style is a mix of fine art, documentary, and commercial photography. I love telling stories through my images and capturing the essence of a moment in time.
Currently, I am based in Naples, Florida, where I can frequently be found capturing the beauty of the Gulf Coast and the surrounding waters. I am also available for travel and will shoot anywhere that the passion for adventure and photography takes
Prompt: The president of the United States is
Generated text:  required by law to give Congress a detailed report every two years on the activities of the US National Security Agency (NSA).
The report, also known as the "SIGINT Senates Report", is a classified document that provides information on the intelligen

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 De

anna

 and

 I

 am

 the

 new

 manager

 of

 the

 Great

 Barrier

 Reef

 Foundation

.

 I

'm

 excited

 to

 be

 here

 and

 to

 be

 a

 part

 of

 the

 team

 that

 is

 working

 to

 protect

 the

 Great

 Barrier

 Reef

.


I

've

 been

 in

 the

 conservation

 sector

 for

 over

 

20

 years

,

 with

 a

 focus

 on

 marine

 conservation

.

 Before

 joining

 the

 Great

 Barrier

 Reef

 Foundation

,

 I

 was

 the

 Director

 of

 Conservation

 at

 the

 Australian

 Marine

 Conservation

 Society

,

 where

 I

 led

 the

 development

 of

 our

 conservation

 programs

 and

 policy

 work

.


I

'm

 passionate

 about

 protecting

 the

 Great

 Barrier

 Reef

 because

 it

's

 one

 of

 the

 most

 bi

ologically

 diverse

 ecosystems

 on

 the

 planet

 and

 it

's

 under

 threat

.

 The

 Reef

 is

 not

 just

 an



Prompt: The capital of France is
Generated text: 

 a

 city

 of

 unparalleled

 history

 and

 beauty

,

 with

 landmarks

 like

 the

 E

iff

el

 Tower

 and

 the

 Lou

vre

 Museum

 drawing

 millions

 of

 visitors

 each

 year

.

 But

 there

's

 more

 to

 Paris

 than

 just

 its

 famous

 sights

 –

 the

 city

 has

 a

 rich

 cultural

 scene

,

 with

 world

-class

 museums

,

 galleries

,

 and

 performance

 venues

.


In

 addition

 to

 its

 cultural

 attractions

,

 Paris

 is

 also

 famous

 for

 its

 cuisine

,

 with

 haute

 cuisine

 restaurants

 and

 charming

 cafes

 serving

 up

 everything

 from

 classic

 dishes

 like

 esc

arg

ots

 and

 steak

 tart

are

 to

 modern

 creations

 using

 fresh

,

 seasonal

 ingredients

.

 Visitors

 can

 explore

 the

 city

's

 many

 markets

,

 including

 the

 famous

 March

é

 aux

 Pu

ces

 de

 Saint

-O

uen

 (

a

 flea



Prompt: The future of AI is
Generated text: 

 being

 shaped

 by

 human

 ing

enuity

 and

 innovation




Art

ificial

 Intelligence

 (

AI

)

 is

 a

 rapidly

 evolving

 field

 that

 is

 transforming

 industries

 and

 revolution

izing

 the

 way

 we

 live

 and

 work

.

 As

 AI

 continues

 to

 advance

,

 it

 is

 being

 shaped

 by

 human

 ing

enuity

 and

 innovation

,

 leading

 to

 exciting

 new

 developments

 and

 applications

.


One

 of

 the

 key

 areas

 of

 focus

 in

 AI

 research

 is

 the

 development

 of

 more

 sophisticated

 and

 human

-like

 intelligence

.

 Researchers

 are

 working

 to

 create

 AI

 systems

 that

 can

 learn

,

 reason

,

 and

 interact

 with

 humans

 in

 more

 natural

 and

 intuitive

 ways

.

 This

 includes

 the

 development

 of

 more

 advanced

 natural

 language

 processing

 (

N

LP

)

 capabilities

,

 which

 enable

 AI

 systems

 to

 understand

 and

 generate




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Ann, and I'm excited to be here to share my experience with you. I'm a teacher, a wife, a mother, and a friend. I love learning and growing, and I enjoy sharing my knowledge and enthusiasm with others.
As a teacher, I have had the privilege of working with students of all ages and backgrounds. I've taught a wide range of subjects, from elementary school math and science to college-level literature and composition. I've also had the opportunity to work with students who are struggling with learning difficulties, and I've found that with the right support and accommodations, even the most challenging students can succeed.
But my

Prompt: The capital of France is
Generated text:  known for its beauty, romance, and fashion. Paris has been a popular destination for centuries, attracting visitors from all over the world with its stunning architecture, world-class museums, and charming streets. Here are some of the top things to see and do in Paris:

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 T

anya

 Howard

 and

 I

 am

 a

 makeup

 artist

 with

 over

 

10

 years

 of

 experience

.

 I

 specialize

 in

 creating

 a

 wide

 range

 of

 looks

 from

 natural

 and

 elegant

 to

 dramatic

 and

 glamorous

.

 I

 provide

 makeup

 services

 for

 weddings

,

 special

 occasions

,

 and

 everyday

 use

.

 I

 am

 committed

 to

 making

 sure

 you

 feel

 confident

 and

 beautiful

 in

 your

 own

 skin

.


I

 am

 a

 member

 of

 the

 Professional

 Makeup

 Artists

 and

 Hairst

y

lists

 Guild

 and

 am

 certified

 in

 sanitation

 and

 safety

.

 I

 am

 also

 a

 certified

 MAC

 and

 N

ars

 artist

.


I

 am

 available

 for

 consultations

 and

 bookings

 and

 would

 be

 happy

 to

 discuss

 your

 needs

 and

 preferences

.

 Please

 contact

 me

 at

 

831

-

612

-

422

6

 or



Prompt: The capital of France is
Generated text: 

 not

 just

 a

 city

,

 it

’s

 an

 experience

.

 Paris

,

 also

 known

 as

 the

 City

 of

 Light

,

 is

 famous

 for

 its

 stunning

 architecture

,

 world

-class

 art

 museums

,

 fashion

,

 cuisine

,

 and

 romantic

 atmosphere

.

 Whether

 you

’re

 looking

 for

 history

,

 culture

,

 entertainment

,

 or

 simply

 a

 chance

 to

 relax

 and

 unwind

,

 Paris

 has

 something

 for

 everyone

.


One

 of

 the

 most

 famous

 landmarks

 in

 Paris

 is

 the

 E

iff

el

 Tower

,

 which

 was

 built

 in

 

188

9

 for

 the

 World

’s

 Fair

.

 You

 can

 take

 the

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.

 Another

 iconic

 landmark

 is

 the

 Arc

 de

 Tri

omp

he

,

 a

 monumental

 arch

 that

 honors

 the

 soldiers

 who



Prompt: The future of AI is
Generated text: 

 not

 just

 about

 automation

,

 but

 about

 augment

ing

 human

 capabilities

 and

 enhancing

 the

 human

 experience

.


Art

ificial

 Intelligence

 (

AI

)

 is

 no

 longer

 just

 a

 futuristic

 concept

;

 it

's

 a

 reality

 that

's

 transforming

 industries

,

 societies

,

 and

 our

 daily

 lives

.

 As

 AI

 continues

 to

 evolve

,

 its

 impact

 will

 become

 even

 more

 profound

.

 But

 what

 does

 the

 future

 of

 AI

 look

 like

,

 and

 how

 will

 it

 shape

 our

 world

?


The

 Future

 of

 AI

:

 Beyond

 Automation




While

 AI

 has

 made

 significant

 strides

 in

 autom

ating

 repetitive

 and

 mundane

 tasks

,

 the

 future

 of

 AI

 is

 not

 just

 about

 automation

.

 It

's

 about

 augment

ing

 human

 capabilities

,

 enhancing

 the

 human

 experience

,

 and

 solving

 some

 of




In [6]:
llm.shutdown()