# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-28 06:53:36 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.24it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.15it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.15it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.57it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.40it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Zane. I am a 4-year-old Maltese mix. I'm a gentle soul with a coat as white as snow. My fur is long and silky, which makes me look like I'm wearing a beautiful cloak. I'm a small dog, but I have a big personality. I love to play and go on adventures with my humans. I'm a good listener and will sit and stay when commanded, but I'm not perfect and sometimes I get distracted by squirrels.
I'm a bit shy at first, but once I get to know you, I'll become your best friend. I love to cuddle and
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States, and is the commander-in-chief of the United States Armed Forces. The president is also the leader of the executive branch of the federal government and is responsible for executing the laws of the land. The president is elected by the people through the Electoral College and serves a four-year term.
The current president of the United St

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Leah

 and

 I

 am

 a

 senior

 studying

 Communications

 at

 the

 University

 of

 Wisconsin

-M

ad

ison

.

 I

 am

 a

 dedicated

 and

 motivated

 individual

 who

 is

 passionate

 about

 using

 my

 skills

 to

 make

 a

 positive

 impact

 in

 my

 community

.

 I

 am

 eager

 to

 learn

 and

 grow

 with

 an

 organization

 that

 shares

 my

 values

 and

 is

 committed

 to

 making

 a

 difference

.


I

 have

 had

 the

 opportunity

 to

 gain

 valuable

 experience

 through

 various

 intern

ships

 and

 volunteer

 work

.

 My

 most

 recent

 internship

 was

 at

 a

 local

 non

-profit

 organization

 where

 I

 worked

 as

 a

 social

 media

 coordinator

.

 I

 was

 responsible

 for

 creating

 and

 scheduling

 content

,

 managing

 the

 organization

’s

 social

 media

 presence

,

 and

 analyzing

 engagement

 metrics

.

 I

 also

 assisted

 with

 events

 and

 campaigns



Prompt: The capital of France is
Generated text: 

 Paris

,

 but

 have

 you

 ever

 heard

 of

 a

 city

 in

 France

 called

 La

 C

iot

at

?

 This

 small

 coastal

 town

 in

 Prov

ence

,

 France

 is

 famous

 for

 being

 the

 first

 location

 in

 the

 world

 where

 motion

 pictures

 were

 publicly

 screened

.


In

 

189

5

,

 the

 Lum

ière

 brothers

,

 Louis

 and

 August

e

,

 held

 the

 first

 public

 screening

 of

 a

 motion

 picture

 in

 La

 C

iot

at

.

 The

 screening

 took

 place

 at

 the

 Salon

 Ind

ien

 du

 Café

 Gym

n

ase

,

 where

 a

 group

 of

 people

 gathered

 to

 watch

 a

 series

 of

 short

 films

,

 including

 workers

 leaving

 a

 factory

,

 a

 train

 arriving

 at

 a

 station

,

 and

 a

 garden

 scene

.

 The

 screening

 was

 a

 huge

 success

,



Prompt: The future of AI is
Generated text: 

 one

 of

 promise

 and

 uncertainty




Art

ificial

 intelligence

 (

AI

)

 is

 a

 broad

 field

 that

 encompasses

 everything

 from

 simple

 rule

-based

 systems

 to

 complex

 machine

 learning

 models

.

 AI

 has

 made

 significant

 strides

 in

 recent

 years

,

 with

 applications

 in

 areas

 such

 as

 image

 and

 speech

 recognition

,

 natural

 language

 processing

,

 and

 predictive

 modeling

.

 However

,

 the

 future

 of

 AI

 is

 uncertain

,

 with

 many

 challenges

 and

 opportunities

 on

 the

 horizon

.

 In

 this

 article

,

 we

 will

 discuss

 some

 of

 the

 key

 developments

 that

 will

 shape

 the

 future

 of

 AI

.


The

 future

 of

 AI

 is

 a

 topic

 of

 much

 debate

 and

 speculation

.

 Some

 experts

 predict

 that

 AI

 will

 continue

 to

 advance

 at

 an

 exponential

 rate

,

 leading

 to

 significant

 breakthrough

s




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Jessica and I am a 34-year-old mother of two who is passionate about cooking, baking, and sharing delicious recipes with others. I am excited to start this blog and share my favorite recipes with you!
I have been cooking and baking for as long as I can remember, but it wasn't until I became a mother that I really started to experiment with new recipes and techniques. My children, Emily and Jackson, are my biggest taste testers and they are always eager to try new foods. My husband, Mike, is a big fan of my cooking and is always asking me to make his favorite dishes.
On this blog, I will

Prompt: The capital of France is
Generated text:  famous for its beauty, history, and culture. Paris, the City of Light, attracts millions of visitors every year. As the center of European politics, finance, and entertainment, Paris is a must-visit destination for any traveler. Paris is home to some of the world's most famous landmarks, such as the Eiffel Tow

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Brandon

 Hill

,

 and

 I

 am

 a

 member

 of

 the

 Sho

al

water

 Bay

 Country

 Music

 Society

.

 I

 am

 excited

 to

 be

 a

 part

 of

 this

 organization

 and

 contribute

 to

 the

 preservation

 of

 traditional

 country

 music

.


I

 have

 been

 playing

 the

 ban

jo

 and

 singing

 for

 over

 

40

 years

,

 and

 I

 have

 had

 the

 privilege

 of

 playing

 with

 various

 bands

 and

 artists

 over

 the

 years

.

 My

 musical

 influences

 include

 blue

grass

,

 country

,

 and

 folk

 music

,

 and

 I

 have

 always

 been

 a

 fan

 of

 the

 classic

 country

 and

 blue

grass

 artists

.


I

 am

 particularly

 drawn

 to

 the

 Sho

al

water

 Bay

 Country

 Music

 Society

 because

 of

 its

 focus

 on

 preserving

 traditional

 country

 music

 and

 its

 commitment

 to

 supporting

 young

 musicians



Prompt: The capital of France is
Generated text: 

 a

 city

 of

 great

 beauty

 and

 historical

 significance

.

 With

 its

 stunning

 architecture

,

 world

-class

 museums

,

 and

 romantic

 atmosphere

,

 Paris

 is

 a

 must

-

visit

 destination

 for

 anyone

 interested

 in

 culture

,

 history

,

 or

 just

 wanting

 to

 experience

 the

 magic

 of

 the

 City

 of

 Light

.

 In

 this

 article

,

 we

 will

 explore

 the

 top

 

10

 things

 to

 do

 in

 Paris

,

 from

 iconic

 landmarks

 to

 hidden

 gems

.


The

 E

iff

el

 Tower

 is

 one

 of

 the

 most

 recognizable

 landmarks

 in

 the

 world

,

 and

 a

 must

-

visit

 attraction

 in

 Paris

.

 Built

 for

 the

 

188

9

 World

's

 Fair

,

 the

 tower

 stands

 at

 

324

 meters

 tall

 and

 offers

 breathtaking

 views

 of

 the

 city

 from

 its

 observation

 decks



Prompt: The future of AI is
Generated text: 

 not

 just

 about

 machines

,

 but

 about

 how

 humans

 interact

 with

 machines

 and

 with

 each

 other

.

 The

 rapid

 development

 of

 AI

 and

 machine

 learning

 technologies

 has

 sparked

 a

 global

 discussion

 about

 the

 ethics

 and

 implications

 of

 these

 technologies

.

 On

 this

 panel

,

 we

 will

 explore

 the

 intersection

 of

 AI

 and

 human

 values

,

 examining

 how

 AI

 is

 being

 used

 to

 enhance

 human

 capabilities

,

 and

 how

 it

 can

 be

 designed

 and

 deployed

 in

 a

 way

 that

 promotes

 positive

 social

 and

 economic

 outcomes

.

 We

 will

 also

 examine

 the

 challenges

 and

 limitations

 of

 AI

,

 and

 consider

 the

 role

 of

 human

 values

 in

 shaping

 the

 future

 of

 AI

 development

.

 The

 panel

 will

 bring

 together

 experts

 from

 academia

,

 industry

,

 and

 civil

 society

 to

 explore

 these




In [6]:
llm.shutdown()