# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  6.32it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.72it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.39it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.25it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.40it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Paul and I am the owner and operator of PPR Contracting. I am a licensed and insured, general contractor with over 25 years of experience in the construction industry. I have a strong background in all aspects of home construction and renovation, including design, planning, estimating, and project management.
I have a passion for providing quality workmanship, excellent customer service, and exceeding client expectations. My goal is to help homeowners achieve their dream home, whether it's a small renovation or a major overhaul.
I am a member of the National Association of the Remodeling Industry (NARI) and have completed the Certified Graduate Remodeler
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States of America. The president serves a four-year term and is limited to two terms. The president is elected by the people through the Electoral College system.
The president

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Steven

 and

 I

 am

 a

 

17

-year

-old

 student

 with

 a

 passion

 for

 photography

.

 I

 love

 capturing

 life

’s

 special

 moments

 and

 the

 world

 around

 me

.

 My

 friends

 would

 describe

 me

 as

 a

 bit

 quirky

,

 always

 looking

 for

 the

 next

 best

 shot

.

 When

 I

’m

 not

 behind

 the

 lens

,

 I

 enjoy

 playing

 guitar

 and

 spending

 time

 with

 my

 family

 and

 friends

.


Here

 are

 some

 of

 my

 favorite

 photos

 from

 my

 recent

 shoots

.

 I

 hope

 you

 enjoy

 them

!

 I

 am

 always

 looking

 for

 new

 opportunities

 to

 shoot

 and

 learn

.

 Feel

 free

 to

 reach

 out

 if

 you

 need

 any

 photos

 taken

.


Before

 I

 started

 taking

 photography

 seriously

,

 I

 was

 just

 a

 curious

 kid

 with

 a

 camera

.

 I



Prompt: The capital of France is
Generated text: 

 known

 for

 its

 stunning

 architecture

,

 rich

 history

,

 and

 world

-class

 museums

.

 But

 did

 you

 know

 that

 Paris

 is

 also

 a

 hub

 for

 international

 fashion

?

 The

 city

 is

 home

 to

 some

 of

 the

 world

's

 most

 famous

 fashion

 designers

,

 including

 Chanel

,

 D

ior

,

 and

 Louis

 V

uit

ton

.


In

 this

 article

,

 we

'll

 explore

 the

 best

 fashion

 destinations

 in

 Paris

 and

 provide

 you

 with

 a

 guide

 on

 how

 to

 navigate

 the

 city

's

 fashion

 scene

 like

 a

 pro

.


1

.

 Ch

amps

-

É

lys

ées




Start

 your

 fashion

 journey

 at

 the

 iconic

 Ch

amps

-

É

lys

ées

,

 one

 of

 the

 world

's

 most

 famous

 shopping

 streets

.

 This

 

1

.

9

-k

il

ometer



Prompt: The future of AI is
Generated text: 

 in

 the

 hands

 of

 humans

.


The

 future

 of

 AI

 is

 in

 the

 hands

 of

 humans

.


The

 future

 of

 AI

 is

 in

 the

 hands

 of

 humans

.


The

 future

 of

 AI

 is

 in

 the

 hands

 of

 humans

.


The

 future

 of

 AI

 is

 in

 the

 hands

 of

 humans

.


The

 future

 of

 AI

 is

 in

 the

 hands

 of

 humans

.


As

 artificial

 intelligence

 continues

 to

 advance

 and

 transform

 industries

,

 it

's

 clear

 that

 the

 future

 of

 AI

 is

 in

 the

 hands

 of

 humans

.

 While

 AI

 has

 the

 potential

 to

 revolution

ize

 many

 aspects

 of

 our

 lives

,

 it

 is

 only

 as

 good

 as

 the

 data

 and

 instructions

 it

 receives

.

 Human

 oversight

,

 guidance

,

 and

 decision

-making

 are

 essential

 to

 ensuring

 that

 AI

 is




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Shelbie and I am a non-profit writer and volunteer for a animal rescue group in North Carolina. My current project is helping with social media and fundraising efforts for the group. I was wondering if you could help me with some writing tips, as well as general advice on how to effectively run a fundraising campaign on social media.

Thank you,
Shelbie

Dear Shelbie,

Congratulations on taking on a new project with the animal rescue group! Your passion for writing and helping animals will surely make a difference. I'd be happy to offer some writing tips and advice on running an effective fundraising campaign on social media.

**Writing Tips:**



Prompt: The capital of France is
Generated text:  known for its fashion, art, and cuisine, but it’s also home to a wide range of unique and quirky museums that showcase everything from street art to science. Here are some of the most unusual museums in Paris:
1. Musée des Égouts de Paris (Paris Sewe

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 L

K

 and

 I

'm

 a

 junior

 major

ing

 in

 Information

 Technology

 and

 Management

.

 I

'm

 a

 fourth

-year

 student

 from

 out

 of

 state

,

 so

 I

'm

 not

 from

 the

 local

 area

.

 I

'm

 really

 interested

 in

 the

 entrepreneurship

 program

,

 but

 I

'm

 not

 sure

 if

 it

's

 right

 for

 me

.


Hi

 L

K

,

 nice

 to

 meet

 you

!

 Welcome

 to

 the

 community

.

 The

 entrepreneurship

 program

 here

 is

 excellent

,

 and

 it

's

 great

 that

 you

're

 considering

 it

.

 Can

 you

 tell

 me

 a

 bit

 more

 about

 what

's

 drawing

 you

 to

 it

?

 What

 specific

 aspects

 of

 entrepreneurship

 interest

 you

 the

 most

?

 Was

 it

 something

 you

're

 passionate

 about

 before

 coming

 to

 college

,

 or

 has

 it

 developed



Prompt: The capital of France is
Generated text: 

 filled

 with

 grand

eur

,

 romance

,

 and

 history

.

 Paris

 is

 one

 of

 the

 world

’s

 most

 iconic

 cities

,

 and

 for

 good

 reason

.

 From

 the

 E

iff

el

 Tower

 to

 the

 Lou

vre

 Museum

,

 there

’s

 no

 shortage

 of

 iconic

 landmarks

 to

 explore

.

 Here

 are

 some

 of

 the

 top

 things

 to

 see

 and

 do

 in

 Paris

:


1

.

 Visit

 the

 E

iff

el

 Tower




The

 E

iff

el

 Tower

 is

 an

 absolute

 must

-

see

 attraction

 in

 Paris

.

 This

 iron

 lattice

 tower

 stands

 

324

 meters

 tall

 and

 offers

 breathtaking

 views

 of

 the

 city

 from

 its

 observation

 decks

.

 You

 can

 take

 the

 stairs

 or

 elevator

 to

 the

 top

 for

 a

 panoramic

 view

 of

 the

 city

.


2

.

 Explore



Prompt: The future of AI is
Generated text: 

 here

:

 New

 approach

 brings

 human

-like

 intelligence

 to

 robots




The

 future

 of

 AI

 is

 here

:

 New

 approach

 brings

 human

-like

 intelligence

 to

 robots




A

 breakthrough

 in

 artificial

 intelligence

 (

AI

)

 research

 is

 set

 to

 bring

 human

-like

 intelligence

 to

 robots

,

 enabling

 them

 to

 learn

,

 adapt

 and

 interact

 with

 humans

 more

 effectively

.


Researchers

 from

 the

 University

 of

 Edinburgh

's

 School

 of

 Inform

atics

,

 in

 collaboration

 with

 the

 University

 of

 California

,

 Berkeley

,

 and

 the

 University

 of

 California

,

 Los

 Angeles

 (

U

CLA

),

 have

 developed

 a

 new

 approach

 to

 AI

 that

 is

 inspired

 by

 human

 cognition

.


The

 new

 approach

,

 known

 as

 the

 "

Hy

brid

 Lif

elong

 Learning

"

 framework

,

 combines

 the

 strengths

 of

 deep




In [6]:
llm.shutdown()