# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
from sglang.utils import print_highlight
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-15 15:02:05 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.39it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.25it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.24it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.68it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.50it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print_highlight("===============================")
    print_highlight(f"Prompt: {prompt}\nGenerated text: {output['text']}")

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print_highlight(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()

Generated text: 

 Christian

 and

 I

 am

 a

 third

 year

 psychology

 student

 at

 the

 University

 of

 St

irling

.

 I

 am

 originally

 from

 a

 small

 town

 in

 the

 Scottish

 Highlands

 and

 have

 a

 strong

 passion

 for

 the

 outdoors

.

 I

 am

 excited

 to

 be

 taking

 on

 the

 role

 of

 Student

 Ambassador

 for

 the

 Psychology

 Department

 and

 look

 forward

 to

 meeting

 and

 supporting

 my

 fellow

 students

.


I

 am

 eager

 to

 represent

 the

 Department

 of

 Psychology

 and

 provide

 support

 to

 students

,

 whether

 it

 be

 academic

 or

 personal

.

 I

 am

 confident

 that

 I

 will

 be

 able

 to

 offer

 valuable

 advice

 and

 insights

 based

 on

 my

 own

 experiences

 as

 a

 student

 in

 the

 Department

.


Throughout

 my

 time

 at

 the

 University

 of

 St

irling

,

 I

 have

 been

 actively

 involved

 in

 various




Generated text: 

 a

 city

 like

 no

 other

.

 With

 a

 rich

 history

,

 stunning

 architecture

,

 and

 a

 vibrant

 cultural

 scene

,

 Paris

 is

 a

 must

-

visit

 destination

 for

 anyone

 interested

 in

 art

,

 fashion

,

 food

,

 and

 more

.

 Here

 are

 some

 of

 the

 top

 attractions

 and

 experiences

 to

 add

 to

 your

 Paris

 itinerary

:


The

 E

iff

el

 Tower

:

 This

 iconic

 landmark

 is

 a

 symbol

 of

 Paris

 and

 one

 of

 the

 most

 recognizable

 structures

 in

 the

 world

.

 Take

 the

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


The

 Lou

vre

 Museum

:

 Home

 to

 some

 of

 the

 world

's

 most

 famous

 artworks

,

 including

 the

 Mona

 Lisa

 and

 Venus

 de

 Milo

.

 The

 museum

's

 stunning

 glass

 pyramid

 entrance

 is

 also




Generated text: 

 human

.

 It

's

 not

 about

 replacing

 people

,

 but

 about

 augment

ing

 them

 to

 achieve

 more

.

 As

 the

 boundaries

 between

 humans

 and

 machines

 blur

,

 we

 must

 rethink

 the

 way

 we

 work

,

 live

,

 and

 interact

.


The

 intersection

 of

 AI

 and

 human

 capabilities

 is

 where

 innovation

 and

 progress

 will

 be

 made

.

 AI

 can

 help

 us

 solve

 complex

 problems

,

 amplify

 human

 creativity

,

 and

 make

 decisions

 more

 informed

.


As

 AI

 continues

 to

 evolve

,

 it

's

 essential

 to

 consider

 the

 social

 and

 economic

 implications

 of

 its

 impact

.

 We

 must

 focus

 on

 creating

 a

 future

 where

 AI

 benefits

 everyone

,

 not

 just

 a

 select

 few

.


The

 future

 of

 work

 will

 require

 a

 mix

 of

 human

 skills

 and

 AI

 capabilities

.

 We




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print_highlight(f"\nPrompt: {prompt}")
        print_highlight(f"Generated text: {output['text']}")


asyncio.run(main())

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print_highlight(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())

Generated text: 

 Sarah

.

 I

 am

 a

 wife

 to

 my

 amazing

 husband

,

 mother

 to

 three

 beautiful

 children

,

 and

 a

 foster

 mom

 to

 two

 adorable

 babies

.

 We

 live

 in

 the

 beautiful

 state

 of

 Idaho

 and

 love

 the

 outdoors

.

 When

 I

 am

 not

 busy

 being

 a

 mom

 or

 enjoying

 the

 great

 outdoors

,

 you

 can

 find

 me

 crafting

,

 baking

,

 and

 trying

 out

 new

 recipes

 in

 the

 kitchen

.


I

 am

 so

 glad

 you

 stopped

 by

 my

 blog

.

 I

 am

 excited

 to

 share

 my

 passion

 for

 life

,

 family

,

 and

 all

 things

 craft

y

 with

 you

.

 My

 goal

 is

 to

 inspire

 and

 encourage

 others

 to

 live

 life

 to

 the

 fullest

,

 make

 memories

 with

 their

 loved

 ones

,

 and

 find

 joy

 in

 the

 simple




Generated text: 

 a

 world

-f

amous

 city

 with

 a

 rich

 history

,

 stunning

 architecture

,

 and

 a

 vibrant

 cultural

 scene

.

 But

 what

 makes

 Paris

 so

 special

?

 Here

 are

 ten

 reasons

 why

 Paris

 is

 a

 must

-

visit

 destination

:


1

.

 Icon

ic

 Land

marks




Paris

 is

 home

 to

 some

 of

 the

 world

's

 most

 famous

 landmarks

,

 including

 the

 E

iff

el

 Tower

,

 the

 Arc

 de

 Tri

omp

he

,

 and

 Notre

-D

ame

 Cathedral

.

 These

 iconic

 structures

 are

 not

 only

 breathtaking

ly

 beautiful

 but

 also

 steep

ed

 in

 history

 and

 cultural

 significance

.


2

.

 Art

 and

 Muse

ums




Paris

 is

 a

 city

 that

 is

 deeply

 connected

 to

 the

 world

 of

 art

.

 The

 city

 is

 home

 to

 some

 of

 the

 world




Generated text: 

 being

 shaped

 by

 a

 variety

 of

 factors

,

 including

 advances

 in

 computing

 power

,

 the

 availability

 of

 large

 datasets

,

 and

 improvements

 in

 machine

 learning

 algorithms

.


Here

 are

 some

 key

 trends

 that

 are

 expected

 to

 shape

 the

 future

 of

 AI

:


1

.

 **

Increased

 adoption

 in

 industries

**:

 AI

 is

 expected

 to

 be

 adopted

 in

 more

 industries

,

 including

 healthcare

,

 finance

,

 transportation

,

 and

 education

.


2

.

 **

Adv

ances

 in

 natural

 language

 processing

 (

N

LP

)**

:

 N

LP

 is

 a

 key

 area

 of

 research

 in

 AI

,

 and

 advancements

 in

 this

 area

 are

 expected

 to

 lead

 to

 better

 chat

bots

,

 voice

 assistants

,

 and

 language

 translation

 systems

.


3

.

 **

R

ise

 of

 explain

able

 AI

 (

X




In [6]:
llm.shutdown()