# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.11it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.05it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.53it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.31it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.28it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:00<00:19,  1.10it/s]

  9%|▊         | 2/23 [00:01<00:10,  2.01it/s]

 13%|█▎        | 3/23 [00:01<00:07,  2.75it/s] 17%|█▋        | 4/23 [00:01<00:05,  3.38it/s]

 22%|██▏       | 5/23 [00:01<00:04,  3.83it/s]

 26%|██▌       | 6/23 [00:01<00:04,  4.05it/s]

 30%|███       | 7/23 [00:02<00:03,  4.31it/s] 35%|███▍      | 8/23 [00:02<00:03,  4.53it/s]

 39%|███▉      | 9/23 [00:02<00:02,  4.72it/s]

 43%|████▎     | 10/23 [00:02<00:02,  4.76it/s] 48%|████▊     | 11/23 [00:02<00:02,  4.87it/s]

 52%|█████▏    | 12/23 [00:03<00:02,  4.94it/s]

 57%|█████▋    | 13/23 [00:03<00:02,  4.91it/s]

 61%|██████    | 14/23 [00:03<00:01,  4.92it/s] 65%|██████▌   | 15/23 [00:03<00:01,  5.03it/s]

 70%|██████▉   | 16/23 [00:03<00:01,  4.98it/s] 74%|███████▍  | 17/23 [00:04<00:01,  4.99it/s]

 78%|███████▊  | 18/23 [00:04<00:00,  5.03it/s]

 83%|████████▎ | 19/23 [00:04<00:00,  4.97it/s]

 87%|████████▋ | 20/23 [00:04<00:00,  4.91it/s]

 91%|█████████▏| 21/23 [00:04<00:00,  4.85it/s] 96%|█████████▌| 22/23 [00:05<00:00,  4.91it/s]

100%|██████████| 23/23 [00:05<00:00,  4.87it/s]100%|██████████| 23/23 [00:05<00:00,  4.30it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  T. Michael Phillips and I am a professional singer, songwriter, and music producer. I have been in the music industry for over 20 years, and I have worked with some of the biggest names in the industry. My goal is to create music that inspires, uplifts, and brings people together.
As a singer, I have had the privilege of performing on some of the biggest stages in the world, including the Grand Ole Opry, the Ryman Auditorium, and the iconic Wembley Stadium. My voice has been described as "powerful," "heartfelt," and "soulful," and I have been praised
Prompt: The president of the United States is
Generated text:  the head of state and government of the United States. He is the chief executive of the federal government and has a number of key responsibilities and powers.
1. The President is the head of state and government of the United States.
2. He is the commander-in-chief of the armed forces.
3. He has the power to negotiate treaties with fo

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Piper

 Red

d

 and

 I

 am

 the

 newest

 member

 of

 the

 Hit

Film

 Pro

 team

!

 I

'm

 st

oked

 to

 be

 joining

 such

 an

 awesome

 team

 and

 contributing

 to

 the

 next

 generation

 of

 visual

 effects

 software

.


A

 little

 bit

 about

 me

:

 I

 come

 from

 a

 film

 school

 background

 and

 have

 a

 strong

 foundation

 in

 traditional

 animation

,

 

3

D

 modeling

,

 and

 visual

 effects

.

 Prior

 to

 joining

 Hit

Film

 Pro

,

 I

 worked

 as

 a

 freelance

 V

FX

 artist

 and

 compos

itor

,

 working

 on

 various

 projects

 ranging

 from

 indie

 shorts

 to

 commercial

 campaigns

.


My

 passion

 for

 filmm

aking

 started

 at

 a

 young

 age

,

 and

 I

 have

 always

 been

 fascinated

 by

 the

 art

 of

 storytelling

 through

 motion

 pictures

.

 I

'm



Prompt: The capital of France is
Generated text: 

 home

 to

 some

 of

 the

 world

's

 most

 famous

 landmarks

,

 including

 the

 E

iff

el

 Tower

 and

 the

 Lou

vre

.

 But

 there

's

 more

 to

 Paris

 than

 just

 its

 famous

 sights

.


Here

 are

 some

 of

 the

 top

 things

 to

 do

 in

 Paris

,

 from

 museums

 and

 art

 galleries

 to

 parks

 and

 gardens

,

 as

 well

 as

 the

 best

 neighborhoods

 to

 explore

 and

 the

 city

's

 most

 famous

 markets

.


Ne

ighborhood

s

 to

 Explore




Paris

 is

 a

 city

 of

 neighborhoods

,

 each

 with its

 own

 unique

 character

 and

 charm

.

 Here

 are

 some

 of

 the

 top

 neighborhoods

 to

 explore

:


1

.

 Le

 Mar

ais

:

 This

 historic

 neighborhood

 is

 known

 for

 its

 trendy

 bout

iques

,

 art

 galleries

,

 and

 restaurants

.

 It



Prompt: The future of AI is
Generated text: 

 not

 just

 about

 machines

 thinking

 like

 humans

,

 but

 about

 machines

 making

 decisions

 that

 complement

 human

 abilities

,

 says

 Y

ann

 Le

C

un

,

 Director

 of

 AI

 Research

 at

 Facebook

 and

 Silver

 Professor

 of

 Computer

 Science

 at

 New

 York

 University

.

 Le

C

un

,

 who

 is

 also

 a

 co

-

creator

 of

 the

 convolution

al

 neural

 network

,

 will

 be

 speaking

 at

 the

 upcoming

 Str

ata

 Data

 Conference

 in

 San

 Francisco

.

 He

 spoke

 with

 Data

 Science

 Times

 about

 the

 future

 of

 AI

 and

 his

 thoughts

 on

 the

 industry

.


Data

 Science

 Times

:

 You

 were

 one

 of

 the

 pioneers

 of

 convolution

al

 neural

 networks

 (

CNN

s

).

 What

 motivated

 you

 to

 work

 on

 CNN

s

 and

 what

 significance

 do

 they

 have

 in

 the

 field




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Emily and I'm a writer and a dog mom. I'm excited to share my writing adventures with you here on this blog.
I'm a bit of a goofball and love to tell stories that make people laugh. My humor is often a bit offbeat and quirky, but I like to think it's also relatable and authentic. I'm also passionate about exploring the world around us, whether that's through travel, trying new foods, or meeting new people.
As a dog mom, I'm also passionate about all things furry and four-legged. My pup, Luna, is a constant source of inspiration and joy in my life,

Prompt: The capital of France is
Generated text:  a must-see destination for anyone who loves history, art, fashion, and food. Here are some top tips for visiting Paris:
1. Learn some French: While many Parisians speak some English, learning a few basic French phrases can go a long way in making your trip more enjoyable. Try to learn how to say "bonjour" (hello), "merci" (thank you), and "excusez-m

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Nicole

 and

 I

 am

 a

 

30

 something

 year

 old

 mom

 of

 two

 young

 children

.

 I

've

 been

 struggling

 with

 anxiety

 for

 a

 few

 years

 now

 and

 I

've

 recently

 started

 to

 learn

 more

 about

 mindfulness

 and

 meditation

.

 I

'm

 excited

 to

 share

 my

 journey

 with

 you

 and

 learn

 from

 others

 as

 well

.


I

'm

 not

 sure

 where

 to

 start

 with

 mindfulness

 and

 meditation

.

 I

've

 tried

 a

 few

 different

 apps

 and

 online

 resources

 but

 I

'm

 not

 sure

 what

 works

 best

 for

 me

.

 Do

 you

 have

 any

 recommendations

?


Great

 to

 hear

 that

 you

're

 taking

 the

 first

 steps

 towards

 exploring

 mindfulness

 and

 meditation

!

 It

's

 completely

 normal

 to

 feel

 unsure

 about

 where

 to

 start

.

 One

 great

 place

 to

 begin



Prompt: The capital of France is
Generated text: 

 known

 for

 its

 rich

 history

,

 vibrant

 culture

,

 and

 stunning

 architecture

.

 From

 the

 iconic

 E

iff

el

 Tower

 to

 the

 world

-f

amous

 Lou

vre

 Museum

,

 Paris

 has

 something

 for

 everyone

.

 Here

 are

 some

 of

 the

 top

 things

 to

 do

 and

 see

 in

 Paris

:


1

.

 Explore

 the

 E

iff

el

 Tower

:

 The

 E

iff

el

 Tower

 is

 an

 iconic

 symbol

 of

 Paris

 and

 one

 of

 the

 most

 recognizable

 landmarks

 in

 the

 world

.

 Visitors

 can

 take

 a

 lift

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


2

.

 Visit

 the

 Lou

vre

 Museum

:

 The

 Lou

vre

 is

 home

 to

 some

 of

 the

 world

's

 most

 famous

 artworks

,

 including

 the

 Mona

 Lisa

.

 The

 museum

's

 vast

 collection



Prompt: The future of AI is
Generated text: 

 not

 in

 the

 robots

,

 but

 in

 the

 data




The

 future

 of

 AI

 is

 not

 in

 the

 robots

,

 but

 in

 the

 data




What

 is

 artificial

 intelligence

?

 In

 recent

 years

,

 the

 term

 has

 become

 synonymous

 with

 robots

,

 drones

,

 and

 other

 advanced

 machines

.

 However

,

 AI

 is

 not

 about

 the

 robots

 themselves

,

 but

 rather

 the

 data

 that

 drives

 them

.


Data

 is

 the

 new

 oil




In

 the

 

20

th

 century

,

 oil

 was

 the

 life

blood

 of

 the

 economy

.

 It

 powered

 our

 cars

,

 fueled

 our

 industries

,

 and

 drove

 economic

 growth

.

 Today

,

 we

 are

 entering

 a

 new

 era

 where

 data

 is

 the

 new

 oil

.

 Just

 as

 oil

 was

 extracted

,

 refined

,

 and

 used




In [6]:
llm.shutdown()