# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
from sglang.utils import print_highlight
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-13 18:35:27 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.37it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.24it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.22it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.66it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.48it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print_highlight("===============================")
    print_highlight(f"Prompt: {prompt}\nGenerated text: {output['text']}")

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print_highlight(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()

Generated text: 

 Holly

 and

 I

 am

 a

 Certified

 Dental

 Assistant

 (

C

DA

).

 I

 joined

 the

 team

 at

 Sm

iles

 by

 Design

 in

 

200

9

.

 I

 have

 been

 in

 the

 dental

 field

 since

 

199

7

 and

 have

 worked

 as

 an

 assistant

 in

 several

 dental

 offices

.


My

 main

 focus

 is

 on

 making

 our

 patients

 feel

 comfortable

 and

 at

 ease

 during

 their

 appointments

.

 I

 enjoy

 assisting

 the

 doctors

 and

 other

 team

 members

 to

 ensure

 our

 patients

 receive

 the

 best

 care

 possible

.

 I

 take

 pride

 in

 my

 work

 and

 strive

 to

 provide

 excellent

 customer

 service

 to

 each

 and

 every

 patient

.


When

 I

'm

 not

 working

,

 I

 enjoy

 spending

 time

 with

 my

 family

 and

 friends

,

 traveling

,

 and

 trying

 out

 new

 restaurants

.

 I




Generated text: 

 a

 city

 of

 romance

,

 fashion

,

 and

 art

.

 From

 the

 iconic

 E

iff

el

 Tower

 to

 the

 world

-class

 museums

,

 Paris

 has

 something

 for

 everyone

.

 Whether

 you

're

 a

 food

ie

,

 a

 history

 buff

,

 or

 a

 shop

ah

olic

,

 this

 guide

 will

 help

 you

 plan

 your

 trip

 to

 the

 City

 of

 Light

.


Things

 to

 Do

 in

 Paris




1

.

 Visit

 the

 E

iff

el

 Tower

:

 The

 iconic

 iron

 lady

 offers

 breathtaking

 views

 of

 the

 city

 from

 its

 observation

 decks

.


2

.

 Explore

 the

 Lou

vre

 Museum

:

 One

 of

 the

 world

's

 largest

 and

 most

 famous

 museums

,

 the

 Lou

vre

 is

 home

 to

 the

 Mona

 Lisa

 and

 many

 other

 artistic

 treasures

.


3

.

 St

roll




Generated text: 

 a

 global

 one

,

 with

 both

 the

 opportunities

 and

 challenges

 that

 come

 with

 it

.

 The

 European

 Union

 has

 been

 at

 the

 forefront

 of

 AI

 development

,

 with

 the

 European

 Commission

 launching

 a

 number

 of

 initiatives

 to

 promote

 AI

 research

 and

 development

.

 The

 Commission

 has

 also

 set

 out

 ambitious

 goals

 for

 the

 development

 of

 AI

 in

 Europe

,

 including

 the

 creation

 of

 a

 European

 AI

 Strategy

.


The

 European

 AI

 Strategy

 aims

 to

 promote

 the

 development

 and

 deployment

 of

 AI

 in

 Europe

,

 while

 also

 ensuring

 that

 the

 benefits

 of

 AI

 are

 shared

 by

 all

.

 The

 strategy

 includes

 a

 number

 of

 key

 objectives

,

 including

:


1

.

 To

 promote

 the

 development

 and

 deployment

 of

 AI

 in

 Europe

,

 while

 also

 ensuring

 that

 the

 benefits

 of




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print_highlight(f"\nPrompt: {prompt}")
        print_highlight(f"Generated text: {output['text']}")


asyncio.run(main())

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print_highlight(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())

Generated text: 

 Ellie

,

 and

 I

 have

 a

 little

 something

 to

 share

 with

 you

 all

 today

.

 It

's

 a

 collection

 of

 some

 of

 my

 favorite

 water

color

 and

 mixed

 media

 art

 pieces

 I

've

 created

 in

 the

 past

 year

.

 I

've

 been

 experimenting

 a

 lot

 with

 new

 techniques

 and

 mediums

,

 and

 I

'm

 really

 excited

 to

 share

 some

 of

 the

 results

 with

 you

.


I

've

 included

 a

 variety

 of

 styles

 and

 themes

 in

 this

 collection

,

 from

 abstract

 landscapes

 to

 whims

ical

 creatures

.

 I

've

 also

 experimented

 with

 different

 paper

 textures

,

 in

ks

,

 and

 paints

 to

 add

 some

 extra

 depth

 and

 interest

 to

 my

 work

.


I

 hope

 you

 enjoy

 this

 little

 peek

 into

 my

 creative

 world

!

 Let

 me

 know

 if

 you




Generated text: 

 known

 for

 its

 rich

 history

,

 stunning

 architecture

 and

 fine

 dining

.

 Here

 are

 the

 top

 things

 to

 do

 in

 Paris

:


1

.

 The

 E

iff

el

 Tower

:

 The

 iconic

 iron

 lattice

 tower

 is

 a

 must

-

visit

 attraction

 in

 Paris

.

 You

 can

 take

 the

 stairs

 or

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


2

.

 The

 Lou

vre

 Museum

:

 This

 world

-ren

owned

 museum

 is

 home

 to

 an

 impressive

 collection

 of

 art

 and

 artifacts

,

 including

 the

 Mona

 Lisa

.

 The

 museum

's

 glass

 pyramid

 entrance

 is

 a

 iconic

 landmark

 in

 itself

.


3

.

 Notre

 Dame

 Cathedral

:

 This

 beautiful

 Gothic

 cathedral

 is

 one

 of

 the

 most

 famous

 landmarks

 in

 Paris

.

 Take

 a

 guided

 tour

 to

 learn




Generated text: 

 at

 stake

 in

 California




R

osa

 T

aur

i

ello

 and

 Joseph

 Cox




The

 debate

 over

 artificial

 intelligence

 regulation

 is

 heating

 up

 in

 California

,

 with

 state

 lawmakers

 weighing

 in

 on

 whether

 to

 impose

 stricter

 rules

 on

 AI

 development

.


One

 bill

,

 AB

 

143

6

,

 aims

 to

 require

 companies

 to

 disclose

 when

 their

 AI

 systems

 are

 being

 used

 to

 make

 decisions

 that

 affect

 people

's

 lives

,

 such

 as

 hiring

 and

 lending

.

 Another

,

 SB

 

827

,

 would

 ban

 the

 use

 of

 facial

 recognition

 technology

 in

 public

 housing

 and

 other

 government

 facilities

.


Crit

ics

 argue

 that

 these

 regulations

 could

 st

ifle

 innovation

 and

 limit

 the

 development

 of

 AI

,

 while

 proponents

 say

 they

 are

 necessary

 to

 ensure

 that

 the

 technology




In [6]:
llm.shutdown()