# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
from sglang.utils import print_highlight
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-08 06:18:51 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.15it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.04it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.01it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.36it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.22it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print_highlight("===============================")
    print_highlight(f"Prompt: {prompt}\nGenerated text: {output['text']}")

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print_highlight(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()

Generated text: 

 Robbie

,

 and

 I

 have

 been

 a

 professional

 photographer

 for

 over

 

10

 years

.

 I

 specialize

 in

 wedding

,

 portrait

 and

 commercial

 photography

.


I

 have

 had

 the

 pleasure

 of

 working

 with

 many

 clients

 over

 the

 years

,

 and

 I

 always

 strive

 to

 create

 unique

 and

 memorable

 images

 that

 capture

 the

 essence

 of

 each

 client

 and

 their

 special

 occasion

.


My

 approach

 is

 relaxed

 and

 un

ob

tr

usive

,

 allowing

 my

 clients

 to

 feel

 at

 ease

 in

 front

 of

 the

 camera

.

 I

 want

 my

 clients

 to

 feel

 like

 they

 are

 working

 with

 a

 friend

,

 not

 just

 a

 photographer

.


I

 am

 based

 in

 San

 Francisco

,

 but

 I

 travel

 frequently

 to

 capture

 the

 beauty

 of

 the

 world

 around

 us

.

 I

 am

 always

 looking




Generated text: 

 the

 city

 of

 Paris

.

 Paris

 is

 a

 city

 known

 for

 its

 rich

 history

,

 art

,

 fashion

,

 and

 culture

.

 It

 has

 been

 an

 important

 center

 for

 many

 centuries

 and

 is

 often

 referred

 to

 as

 the

 "

City

 of

 Light

."

 Paris

 is

 famous

 for

 its

 landmarks

,

 such

 as

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 and

 Notre

 Dame

 Cathedral

,

 as

 well

 as

 its

 fashion

 industry

 and

 romantic

 atmosphere

.


Paris

 is

 home

 to

 many

 famous

 museums

,

 including

 the

 Lou

vre

,

 which

 is

 one

 of

 the

 world

's

 largest

 and

 most

 visited

 museums

.

 The

 Lou

vre

 contains

 a

 vast

 collection

 of

 art

 and

 artifacts

 from

 around

 the

 world

,

 including

 the

 Mona

 Lisa

.

 The

 city

 also




Generated text: 

 changing

,

 and

 not

 necessarily

 in

 a

 good

 way

.


In

 recent

 years

,

 artificial

 intelligence

 (

AI

)

 has

 been

 touted

 as

 a

 revolutionary

 technology

 that

 can

 solve

 complex

 problems

,

 improve

 efficiency

,

 and

 even

 save

 lives

.

 But

 a

 growing

 number

 of

 experts

 are

 sounding

 the

 alarm

 about

 the

 potential

 risks

 and

 consequences

 of

 unchecked

 AI

 development

.


One

 of

 the

 main concerns

 is the

 creation

 of

 AI

 systems

 that

 are

 increasingly

 sophisticated

 and

 autonomous

,

 but

 also

 potentially

 uncont

rollable

.

 As

 AI

 becomes

 more

 advanced

,

 it

 may

 become

 harder

 to

 predict

 and

 manage

 its

 behavior

,

 making

 it

 a

 potential

 threat

 to

 human

 safety

 and

 well

-being

.


Another

 concern

 is

 the

 lack

 of

 transparency

 and

 accountability

 in

 AI

 decision

-making

.




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print_highlight(f"\nPrompt: {prompt}")
        print_highlight(f"Generated text: {output['text']}")


asyncio.run(main())

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print_highlight(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())

Generated text: 

 Connie

,

 and

 I

'm

 a

 busy

 mom

 of

 

3

.

 I

'm

 passionate

 about

 living

 a

 healthy

 lifestyle

,

 trying

 out

 new

 recipes

,

 and

 sharing

 my

 favorite

 tips

 and

 tricks

 with

 fellow

 parents

.

 Welcome

 to

 my

 blog

,

 where

 I

'll

 be

 sharing

 my

 journey

 to

 balancing

 mother

hood

 and

 wellness

.


This

 past

 year

 has

 been

 a

 whirl

wind

 for

 me

 and

 my

 family

.

 We

've

 been

 adjusting

 to

 a

 new

 routine

,

 and

 it

's

 been

 a

 challenge

 to

 find

 time

 for

 myself

 amidst

 all

 the

 chaos

.

 But

,

 I

've

 learned

 that

 even

 small

 moments

 can

 make

 a

 big

 impact

 on

 my

 overall

 well

-being

.


One

 of

 my

 favorite

 things

 to

 do

 is

 cook

.

 I

 love

 experimenting




Generated text: 

 a

 city

 that

 never

 sleeps

 and

 has

 a

 lot

 to

 offer

.

 There

 are

 world

-class

 museums

,

 restaurants

,

 shopping

,

 and

 a

 plethora

 of

 historical

 landmarks

 to

 explore

.

 Whether

 you

 are

 looking

 for

 a

 romantic

 getaway

 or

 a

 family

 vacation

,

 Paris

 has

 something

 for

 everyone

.

 Here

 are

 the

 top

 

10

 things

 to

 see

 and

 do

 in

 Paris

:


1

.

 The

 E

iff

el

 Tower

:

 This

 iconic

 iron

 lattice

 tower

 is

 a

 must

-

visit

 attraction

 in

 Paris

.

 It

 offers

 breathtaking

 views

 of

 the

 city

 from

 its

 observation

 decks

 and

 is

 a

 great

 place

 to

 watch

 the

 sunset

.


2

.

 The

 Lou

vre

 Museum

:

 The

 Lou

vre

 is

 one

 of

 the

 world

's

 largest

 and

 most

 famous

 museums




Generated text: 

 not

 just

 about

 machines

,

 but

 also

 about

 humans

.

 Human

-A

I

 collaboration

 is

 the

 future

,

 and

 it

â€™s

 happening

 now

.

 In

 this

 article

,

 we

 will

 explore

 how

 humans

 and

 AI

 can

 work

 together

 to

 achieve

 great

 things

.



##

 Step

 

1

:

 Understanding

 the

 Basics

 of

 Human

-A

I

 Collaboration




Human

-A

I

 collaboration

 involves

 the

 integration

 of

 human

 capabilities

 and

 AI

 systems

 to

 solve

 complex

 problems

,

 enhance

 productivity

,

 and

 improve

 decision

-making

.

 This

 collaboration

 can

 take

 many

 forms

,

 including

 human

-A

I

 teams

 working

 together

 on

 a

 project

,

 AI

 systems

 assisting

 humans

 in

 specific

 tasks

,

 and

 AI

-driven

 tools

 providing

 insights

 to

 humans

.



##

 Step

 

2

:

 Ident

ifying

 the

 Benefits

 of




In [6]:
llm.shutdown()