# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
from sglang.utils import print_highlight
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-08 06:55:05 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.13it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.01it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:03<00:01,  1.04s/it]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.27it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.16it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print_highlight("===============================")
    print_highlight(f"Prompt: {prompt}\nGenerated text: {output['text']}")

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print_highlight(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()

Generated text: 

 Ruby

 and

 I

 am

 a

 senior

.

 I

 have

 been

 with

 the

 firm

 for

 two

 years

 and

 I

 am

 an

 expert

 in

 all

 things

 related

 to

 the

 law

.


Hello

,

 my

 name

 is

 Ruby

 and

 I

 am

 a

 senior

 at

 this

 firm

.

 I

 have

 been

 working

 here

 for

 two

 years

 and

 I

 have

 extensive

 experience

 in

 all

 areas

 of

 law

.

 I

 am

 confident

 that

 I

 can

 provide

 you

 with

 the

 best

 possible

 service

 and

 help

 you

 navigate

 any

 legal

 issues

 you

 may

 be

 facing

.


I

 have

 worked

 on

 a

 wide

 range

 of

 cases

,

 from

 personal

 injury

 to

 family

 law

,

 and

 I

 have

 a

 deep

 understanding

 of

 the

 legal

 system

.

 I

 am

 also

 very

 knowledgeable

 about

 the

 various

 laws

 and

 regulations

 that




Generated text: 

 Paris

.

 I

 am

 very

 excited

 to

 visit

 Paris

 because

 of

 the

 famous

 E

iff

el

 Tower

.


I

 have

 heard

 that

 the

 E

iff

el

 Tower

 is

 one

 of

 the

 most

 famous

 landmarks

 in

 the

 world

 and

 it

 is

 made

 of

 iron

.

 I

 have

 seen

 pictures

 of

 it

 and

 it

 looks

 amazing

.

 I

 hope

 to

 see

 it

 in

 person

 one

 day

.


I

 am

 also

 looking

 forward

 to

 trying

 some

 French

 food

 while

 I

 am

 there

.

 I

 have

 heard

 that

 French

 cuisine

 is

 very

 delicious

 and

 I

 am

 excited

 to

 try

 some

 of

 the

 local

 specialties

 like

 esc

arg

ot

 and

 cro

iss

ants

.


I

 am

 also

 interested

 in

 visiting

 the

 Lou

vre

 Museum

 to

 see

 the

 famous

 painting

 the

 Mona

 Lisa

.




Generated text: 

 in

 the

 cloud

,

 and

 it

's

 getting

 closer




Art

ificial

 intelligence

 (

AI

)

 has

 the

 potential

 to

 revolution

ize

 various

 industries

,

 from

 healthcare

 to

 finance

,

 and

 more

.

 With

 the

 rise

 of

 cloud

 computing

,

 the

 future

 of

 AI

 is

 looking

 brighter

 than

 ever

.

 Here

 are

 some

 key

 developments

 and

 insights

 that

 highlight

 the

 growing

 importance

 of

 cloud

-based

 AI

:


1

.

 Cloud

 AI

 adoption

 is

 on

 the

 rise

:

 According

 to

 a

 report

 by

 Markets

and

Mark

ets

,

 the

 cloud

 AI

 market

 is

 expected

 to

 grow

 from

 $

4

.

4

 billion

 in

 

202

0

 to

 $

30

.

6

 billion

 by

 

202

5

,

 at

 a

 Compound

 Annual

 Growth

 Rate

 (

C

AGR

)

 of

 




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print_highlight(f"\nPrompt: {prompt}")
        print_highlight(f"Generated text: {output['text']}")


asyncio.run(main())

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print_highlight(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())

Generated text: 

 Randy

 and

 I

'm

 a

 

40

 something

 male

,

 a

 bit

 rough

 around

 the

 edges

,

 but

 a

 kind

 soul

.

 I

've

 been

 living

 in

 the

 city

 for

 over

 

20

 years

 and

 I

've

 got

 a

 deep

 love

 for

 the

 place

.

 There

's

 something

 about

 the

 energy

 and

 the

 people

 that

 just

 draws

 me

 in

.


I

've

 been

 working

 as

 a

 handy

man

 for

 a

 while

 now

,

 and

 I

've

 got

 a

 great

 reputation

 among

 my

 clients

.

 I

 love

 the

 physical

 work

 and

 the

 satisfaction

 of

 fixing

 something

 or

 making

 something

 new

.

 It

's

 a

 great

 feeling

 to

 know

 that

 I

'm

 helping

 people

 out

 and

 making

 a

 difference

 in

 their

 lives

.


But

,

 to

 be

 honest

,

 I




Generated text: 

 located

 in

 the

 center

 of

 the

 country

 and

 has

 a

 population

 of

 approximately

 

12

 million

 people

 in

 the

 metropolitan

 area

.

 Its

 history

 dates

 back

 to

 the

 

3

rd

 century

 BC

 and

 has

 been

 a

 significant

 cultural

 and

 intellectual

 hub

 throughout

 the

 centuries

.


Paris

 is

 famous

 for

 its

 stunning

 architecture

,

 with

 iconic

 landmarks

 such

 as

 the

 E

iff

el

 Tower

,

 the

 Notre

-D

ame

 Cathedral

,

 the

 Lou

vre

 Museum

,

 and

 many

 others

.

 Visitors

 can

 explore

 the

 city

's

 historic

 neighborhoods

,

 such

 as

 Mont

mart

re

 and

 Le

 Mar

ais

,

 and

 enjoy

 the

 city

's

 vibrant

 street

 life

,

 cafes

,

 and

 restaurants

.


The

 city

 is

 also

 known

 for

 its

 fashion

 industry

,

 with

 Paris

 Fashion

 Week

 being




Generated text: 

 here

!


Get

 the

 latest

 AI

-powered

 tools

 and

 technologies

 for

 your

 business




Schedule

 a

 free

 consultation

 to

 discover

 how

 AI

 can

 transform

 your

 organization




Get

 ready

 to

 revolution

ize

 your

 business

 with

 AI

!


Discover

 how

 our

 AI

 solutions

 can

 drive

 efficiency

,

 improve

 accuracy

,

 and

 boost

 productivity




AI

-P

owered

 Solutions

 for

 Business




Discover

 how

 our

 AI

-powered

 solutions

 can

 transform

 your

 organization




AI

 Chat

bots




Autom

ate

 customer

 support

 and

 improve

 customer

 experience

 with

 our

 AI

-powered

 chat

bots




AI

-P

owered

 CRM




Discover

 how

 our

 AI

-powered

 CRM

 can

 help

 you

 manage

 customer

 relationships

 more

 effectively




AI

-P

owered

 Analytics




Unlock

 actionable

 insights

 with

 our

 AI

-powered

 analytics

 solutions




AI

-P

owered




In [6]:
llm.shutdown()