# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.11it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.04it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.52it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.30it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.27it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:00<00:19,  1.11it/s]  9%|▊         | 2/23 [00:01<00:09,  2.11it/s]

 13%|█▎        | 3/23 [00:01<00:06,  2.99it/s] 17%|█▋        | 4/23 [00:01<00:05,  3.71it/s]

 22%|██▏       | 5/23 [00:01<00:04,  4.19it/s] 26%|██▌       | 6/23 [00:01<00:03,  4.46it/s]

 30%|███       | 7/23 [00:01<00:03,  4.70it/s] 35%|███▍      | 8/23 [00:02<00:03,  4.83it/s]

 39%|███▉      | 9/23 [00:02<00:02,  5.07it/s] 43%|████▎     | 10/23 [00:02<00:02,  5.28it/s]

 48%|████▊     | 11/23 [00:02<00:02,  5.44it/s] 52%|█████▏    | 12/23 [00:02<00:01,  5.54it/s]

 57%|█████▋    | 13/23 [00:03<00:01,  5.15it/s]

 61%|██████    | 14/23 [00:03<00:01,  4.91it/s]

 65%|██████▌   | 15/23 [00:03<00:01,  4.19it/s] 70%|██████▉   | 16/23 [00:03<00:01,  4.53it/s]

 74%|███████▍  | 17/23 [00:04<00:01,  4.66it/s] 78%|███████▊  | 18/23 [00:04<00:01,  4.77it/s]

 83%|████████▎ | 19/23 [00:04<00:00,  5.05it/s] 87%|████████▋ | 20/23 [00:04<00:00,  5.24it/s]

 91%|█████████▏| 21/23 [00:04<00:00,  5.15it/s] 96%|█████████▌| 22/23 [00:04<00:00,  5.24it/s]

100%|██████████| 23/23 [00:05<00:00,  5.08it/s]100%|██████████| 23/23 [00:05<00:00,  4.46it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Samantha (Sam) and I’m a mental health nurse. I have a keen interest in mental health, psychology, and personal development. As a nurse, I have worked with a range of clients who have experienced trauma, anxiety, depression, and other mental health issues. My passion is to help people overcome their struggles and live a more fulfilling life.
My blog is a space where I share my thoughts, experiences, and knowledge on mental health and wellness. I aim to provide helpful information, advice, and insights that can support you in your own journey towards better mental health and self-care.
I believe that mental health is just as important as
Prompt: The president of the United States is
Generated text:  often called the leader of the free world, but President Obama's appearance on Saturday Night Live this weekend suggests a different reality.
President Obama's appearance on Saturday Night Live was a first for a sitting president in 15 years, and it

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 A

beer

,

 I

 am

 a

 

28

 year

 old

 Software

 Engineer

 working

 in

 London

.

 I

 am

 an

 avid

 reader

,

 a

 big

 fan

 of

 sci

-fi

 and

 fantasy

 novels

.

 I

 am

 also

 a

 sports

 enthusiast

,

 I

 play

 football

 and

 tennis

 in

 my

 free

 time

.

 I

 have

 a

 passion

 for

 music

 and

 I

 play

 the

 guitar

,

 I

 love

 listening

 to

 various

 genres

 of

 music

 from

 rock

 to

 classical

.

 I

 am

 also

 a

 food

ie

 and

 love

 trying

 out

 new

 cuis

ines

 and

 cooking

 techniques

.

 I

 am

 an

 outdoors

y

 person

,

 I

 love

 camping

 and

 hiking

 in

 my

 free

 time

.


I

 am

 a

 friendly

 and

 approach

able

 person

,

 I

 love

 meeting

 new

 people

 and

 making

 friends

.

 I

 am



Prompt: The capital of France is
Generated text: 

 not

 Paris




The

 capital

 of

 France

 is

 not

 Paris




You

 are

 likely

 to

 know

 that

 Paris

 is

 the

 capital

 of

 France

,

 but

 this

 is

 not

 true

.

 Paris

 is

 the

 largest

 city

 in

 France

 and

 the

 most

 famous

 tourist

 destination

 in

 the

 country

,

 but

 it

 is

 not

 the

 capital

.


The

 capital

 of

 France

 is

 actually

 a

 city

 called

 Re

ims

,

 which

 is

 located

 in

 the

 northeastern

 part

 of

 the

 country

.

 Re

ims

 has

 been

 the

 capital

 of

 France

 since

 

987

,

 when

 Hugh

 Cap

et

 was

 crowned

 king

.

 However

,

 in

 

987

,

 Paris

 was

 a

 small

 town

 on

 the

 banks

 of

 the

 Se

ine

 River

,

 and

 it

 was

 not

 considered

 a

 major

 city

 at

 that

 time



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 it

's

 also

 raising

 many

 concerns

 about

 the

 impact

 on

 society

,

 particularly

 when

 it

 comes

 to

 jobs

.

 Robotics

 and

 automation

 are

 transforming

 the

 workforce

 and

 driving

 significant

 changes

 in

 the

 way

 businesses

 operate

.


Will

 robots

 replace

 human

 workers

?

 Or

 will

 AI

 augment

 human

 capabilities

,

 freeing

 us

 up

 to

 focus

 on

 more

 complex

 tasks

?

 To

 better

 understand

 the

 future

 of

 work

,

 we

'll

 explore

 the

 role

 of

 AI

 in

 shaping

 the

 workforce

 and

 what

 this

 means

 for

 employees

,

 businesses

,

 and

 the

 economy

 as

 a

 whole

.



**

Job

 Dis

placement

 vs

.

 Job

 Aug

mentation

**



The

 fear

 of

 job

 displacement

 is

 understandable

,

 given

 the

 pace

 of

 technological

 advancements

.

 While

 AI

 may

 automate

 certain

 tasks




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Shirley. I am a 30 year old woman, from the UK. I love animals, especially cats and dogs. I am a bit of a movie buff, and enjoy watching a good film on my nights off. I am also a big fan of music, and have been to a lot of concerts in my lifetime. I am looking to meet new people, and make some new friends.
I am easy going, and enjoy trying new things. I am also a bit of a foodie, and love trying out new restaurants and cuisines. I am also quite fit, and enjoy going to the gym and taking part in sports.


Prompt: The capital of France is
Generated text:  a city steeped in history and culture, and it is one of the most popular tourist destinations in the world. Paris is famous for its stunning architecture, art, fashion, and cuisine, and its picturesque streets and landmarks are a photographer's dream.
The Eiffel Tower is an iconic symbol of Paris and one of the most recognizable landmarks in the world. It was built for the 1889 World's Fair an

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Tom

.

 I

'm

 a

 mathematic

ian

,

 but

 not

 in

 the

 classical

 sense

.

 I

'm

 a

 mathematic

ian

 of

 probability

.

 I

 have

 spent

 years

 studying

 the

 unpredictable

 nature

 of

 chance

.

 But

,

 I

 have

 always

 been

 fascinated

 by

 the

 concept

 of

 probability

 itself

.

 I

 have

 a

 hypothesis

 that

 probability

 is

 not

 as

 random

 as

 we

 think

 it

 is

.

 I

 believe

 that

 there

 are

 hidden

 patterns

 and

 structures

 that

 govern

 the

 way

 probability

 works

.

 And

 I

'm

 determined

 to

 uncover

 them

.



I

've

 spent

 years

 studying

 the

 subject

,

 pouring

 over

 books

 and

 papers

,

 talking

 to

 other

 mathematic

ians

 and

 statist

icians

.

 But

,

 I

've

 made

 little

 progress

.

 The

 more

 I

 learn

,

 the

 more

 I

 realize



Prompt: The capital of France is
Generated text: 

 a

 must

-

visit

 destination

 for

 any

 traveler

.

 Paris

,

 the

 City

 of

 Light

,

 is

 famous

 for

 its

 stunning

 architecture

,

 art

 museums

,

 fashion

,

 and

 romantic

 atmosphere

.

 Here

 are

 some

 top

 things

 to

 do

 and

 see

 in

 Paris

:


1

.

 Visit

 the

 E

iff

el

 Tower

:

 The

 iconic

 E

iff

el

 Tower

 is

 a

 must

-

visit

 attraction

 in

 Paris

.

 You

 can

 take

 the

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


2

.

 Explore

 the

 Lou

vre

 Museum

:

 The

 Lou

vre

 is

 one

 of

 the

 world

's

 largest

 and

 most

 famous

 museums

,

 housing

 an

 impressive

 collection

 of

 art

 and

 artifacts

 from

 ancient

 civilizations

 to

 the

 

21

st

 century

.


3

.

 Wander

 along



Prompt: The future of AI is
Generated text: 

 in

 our

 hands




The

 development

 of

 artificial

 intelligence

 (

AI

)

 has

 been

 a

 subject

 of

 fascination

 for

 many

 decades

.

 The

 concept

 of

 creating

 machines

 that

 can

 think

 and

 act

 like

 humans

 has

 been

 explored

 in

 various

 fields

,

 including

 computer

 science

,

 philosophy

,

 and

 science

 fiction

.

 While

 we

 have

 made

 significant

 progress

 in

 AI

,

 its

 future

 is

 still

 uncertain

 and

 depends

 on

 how

 we

 choose

 to

 shape

 it

.


On

 one

 hand

,

 AI

 has

 the

 potential

 to

 bring

 about

 immense

 benefits

 to

 society

,

 such as

:


1

.

 Impro

ving

 healthcare

 outcomes

:

 AI

 can

 help

 diagnose

 diseases

 more

 accurately

,

 develop

 personalized

 treatment

 plans

,

 and

 even

 assist

 in

 surgical

 procedures

.


2

.

 Enh

ancing

 education

:

 AI




In [6]:
llm.shutdown()