# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
from sglang.utils import print_highlight
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-15 11:53:57 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.08it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:02<00:02,  1.02s/it]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:03<00:01,  1.03s/it]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.31it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.18it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print_highlight("===============================")
    print_highlight(f"Prompt: {prompt}\nGenerated text: {output['text']}")

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print_highlight(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()

Generated text: 

 Ch

ay

lee

,

 and

 I

 am

 a

 

17

 year

 old

 girl

 who

 is

 currently

 taking

 my

 gap

 year

 and

 exploring

 the

 world

 of

 technology

 and

 coding

.

 



I

 have

 recently

 started

 to

 learn

 more

 about

 blockchain

 technology

 and

 cryptocurrencies

,

 and

 I

 am

 fascinated

 by

 its

 potential

 to

 bring

 about

 positive

 change

 in

 the

 world

.

 



I

 am

 excited

 to

 join

 this

 community

 and

 learn

 more

 about

 blockchain

 and

 its

 applications

,

 and

 I

 am

 looking

 forward

 to

 sharing

 my

 own

 experiences

 and

 insights

 as

 I

 continue

 on

 my

 journey

 of

 learning

.



What

 draws

 you

 to

 blockchain

 technology

,

 and

 what

 are

 your

 thoughts

 on

 its

 potential

 to

 bring

 about

 positive

 change

 in

 the

 world

?



---



Hello

 Ch

ay

lee

!



I




Generated text: 

 a

 city

 that

 has

 seen

 its

 fair

 share

 of

 history

 and

 culture

.

 With

 its

 rich

 past

,

 stunning

 architecture

,

 and

 world

-class

 museums

,

 Paris

 is

 a

 destination

 that

 has

 capt

ivated

 travelers

 for

 centuries

.

 From

 the

 iconic

 E

iff

el

 Tower

 to

 the

 Lou

vre

 Museum

,

 there

's

 no

 shortage

 of

 iconic

 landmarks

 to

 explore

.

 Visitors

 can

 stroll

 along

 the

 Se

ine

 River

,

 visit

 the

 Palace

 of

 Vers

ailles

,

 and

 indulge

 in

 the

 city

's

 famous

 cuisine

.


Paris

 is

 a

 city

 that

's

 steep

ed

 in

 art

,

 fashion

,

 and

 history

.

 Visitors

 can

 explore

 the

 city

's

 many

 museums

,

 including

 the

 Mus

ée

 d

'

Or

say

,

 which

 is

 home

 to

 an

 impressive

 collection

 of




Generated text: 

 a

 topic

 of

 great

 interest

 and

 debate

.

 As

 AI

 continues

 to

 evolve

 and

 improve

,

 it

's

 likely

 to

 have

 a

 significant

 impact

 on

 various

 aspects

 of

 our

 lives

.

 Here

 are

 some

 potential

 future

 developments

 in

 AI

 that

 could

 shape

 the

 world

:


1

.

 Increased

 Adoption

 in

 Industries

:


AI

 is

 expected

 to

 become

 more

 widespread

 in

 industries

 like

 healthcare

,

 finance

,

 transportation

,

 and

 education

.

 This

 could

 lead

 to

 more

 efficient

 and

 effective

 processes

,

 improved

 decision

-making

,

 and

 enhanced

 customer

 experiences

.


2

.

 Adv

ancements

 in

 Natural

 Language

 Processing

 (

N

LP

):


N

LP

 is

 a

 crucial

 aspect

 of

 AI

,

 enabling

 machines

 to

 understand

 and

 generate

 human

 language

.

 Future

 advancements

 in

 N

LP

 could

 lead

 to




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print_highlight(f"\nPrompt: {prompt}")
        print_highlight(f"Generated text: {output['text']}")


asyncio.run(main())

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print_highlight(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())

Generated text: 

 Emma

,

 I

 am

 a

 student

 from

 Birmingham

 University

,

 currently

 on

 exchange

 in

 Paris

.

 I

 have

 chosen

 to

 stay

 with

 a

 host

 family

,

 the

 Dup

ont

s

,

 who

 live

 in

 a

 lovely

 apartment

 in

 the

 Latin

 Quarter

.

 I

 have

 been

 here

 for

 a

 few

 weeks

 now

 and

 have

 been

 enjoying

 every

 moment

 of

 it

.

 From

 the

 food

,

 the

 language

,

 the

 culture

,

 and

 the

 people

,

 everything

 is

 just

 so

 beautiful

 and

 captivating

.

 I

 have

 made

 friends

 with

 my

 host

 siblings

,

 Pierre

 and

 Sophie

,

 who

 are

 both

 very

 nice

 and

 helpful

.

 They

 have

 introduced

 me

 to

 their

 friends

 and

 I

 have

 met

 some

 really

 interesting

 people

.


The

 apartment

 is

 quite

 small

,

 but

 it

 is




Generated text: 

 the

 perfect

 destination

 for

 a

 city

 break

 in

 Europe

.

 Paris

 is

 a

 city

 steep

ed

 in

 history

,

 romance

 and

 beauty

.

 With

 its

 stunning

 architecture

,

 world

-class

 museums

,

 and

 vibrant

 cultural

 scene

,

 Paris

 has

 something

 to

 offer

 for

 every

 kind

 of

 traveler

.


Here

 are

 the

 top

 things

 to

 do

 in

 Paris

:


Visit

 the

 E

iff

el

 Tower

:

 The

 iconic

 iron

 lady

 of

 Paris

 is

 a

 must

-

visit

 attraction

.

 You

 can

 take

 the

 stairs

 or

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


Explore

 the

 Lou

vre

 Museum

:

 The

 world

’s

 largest

 art

 museum

 is

 home

 to

 some

 of

 the

 most

 famous

 paintings

 in

 the

 world

,

 including

 the

 Mona

 Lisa

.


Walk

 along

 the




Generated text: 

 about

 collaboration

,

 not

 replacement




Art

ificial

 intelligence

 is

 a

 powerful

 tool

 that

 can

 augment

 human

 capabilities

 and

 help

 us

 solve

 complex

 problems

,

 but

 it

 will

 not

 replace

 the

 unique

 qualities

 and

 strengths

 of

 human

 workers

.


At

 a

 recent

 summit

,

 I

 heard

 a

 senior

 executive

 from

 a

 large

 technology

 firm

 say

 that

 AI

 was

 replacing

 workers

 in

 certain

 industries

.

 He

 was

 worried

 that

 this

 would

 lead

 to

 significant

 job

 losses

 and

 social

 disruption

.

 I

 couldn

't

 help

 but

 wonder

 if

 he

 was

 viewing

 AI

 through

 the

 lens

 of

 a

 

19

th

-century

 Industrial

 Revolution

-style

 mechan

ization

.


Let

's

 be

 clear

:

 AI

 is

 not

 a

 replacement

 for

 workers

;

 it

's

 an

 en

abler

.

 It

's

 a

 tool




In [6]:
llm.shutdown()