# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  5.43it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.64it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.30it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.17it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.32it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:36,  1.68s/it]

  9%|▊         | 2/23 [00:02<00:19,  1.08it/s]

 13%|█▎        | 3/23 [00:02<00:12,  1.63it/s]

 17%|█▋        | 4/23 [00:02<00:08,  2.13it/s]

 22%|██▏       | 5/23 [00:02<00:06,  2.64it/s]

 26%|██▌       | 6/23 [00:03<00:05,  2.95it/s]

 30%|███       | 7/23 [00:03<00:04,  3.38it/s]

 35%|███▍      | 8/23 [00:03<00:04,  3.58it/s]

 39%|███▉      | 9/23 [00:03<00:03,  3.75it/s]

 43%|████▎     | 10/23 [00:03<00:03,  4.04it/s]

 48%|████▊     | 11/23 [00:04<00:02,  4.21it/s]

 52%|█████▏    | 12/23 [00:04<00:02,  4.22it/s]

 57%|█████▋    | 13/23 [00:04<00:02,  4.11it/s]

 61%|██████    | 14/23 [00:04<00:02,  4.18it/s]

 65%|██████▌   | 15/23 [00:05<00:01,  4.30it/s]

 70%|██████▉   | 16/23 [00:05<00:01,  4.04it/s] 74%|███████▍  | 17/23 [00:05<00:01,  4.31it/s]

 78%|███████▊  | 18/23 [00:05<00:01,  4.60it/s] 83%|████████▎ | 19/23 [00:05<00:00,  4.79it/s]

 87%|████████▋ | 20/23 [00:06<00:00,  4.93it/s] 91%|█████████▏| 21/23 [00:06<00:00,  5.01it/s]

 96%|█████████▌| 22/23 [00:06<00:00,  5.10it/s]100%|██████████| 23/23 [00:06<00:00,  5.15it/s]100%|██████████| 23/23 [00:06<00:00,  3.43it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Tony Siao, and I am a mathematics teacher and a part-time children's book author. I am delighted to share with you my latest book, "The Adventures of a Pencil Lead".
This is a story about a group of pencil leads who are living their lives in the graphite factory. They dream of making a difference in the world, but they feel that their roles are limited to just being used for drawing. One day, they decided to escape from the factory and set out to fulfill their dreams.
Their journey takes them through various environments, from a messy desk to a colorful art studio. Along the way, they meet different characters
Prompt: The president of the United States is
Generated text:  not in charge of the country, at least not in the way that people commonly assume. While the president has many powers, the extent of their authority is limited. The president is in many ways more like a governor of a state than a supreme ruler.
The president's powers are out

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Sophia

 and

 I

 am

 

12

 years

 old

.

 I

 love

 animals

 and

 I

 love

 to

 draw

 and

 paint

.

 I

 like

 to

 spend

 my

 free

 time

 playing

 with

 my

 friends

 and

 riding

 my

 bike

.


I

 love

 animals

 because

 they

 are

 cute

 and

 fun

 to

 be

 around

.

 My

 favorite

 animal

 is

 a

 dog

 because

 they

 are

 loyal

 and

 friendly

.

 I

 have

 a

 dog

 named

 Max

 and

 he

 is

 the

 best

 dog

 in

 the

 world

!

 He

 is

 a

 golden

 retrie

ver

 and

 he

 loves

 to

 play

 fetch

 with

 me

.


I

 also

 love

 to

 draw

 and

 paint

 because

 it

 helps

 me

 to

 express

 my

 feelings

 and

 be

 creative

.

 I

 like

 to

 draw

 pictures

 of

 animals

,

 especially

 dogs

 and

 cats

.

 My

 favorite

 thing



Prompt: The capital of France is
Generated text: 

 also

 known

 as

 the

 City

 of

 Light

 (

La

 Ville

 Lum

ière

)

 and

 is

 one

 of

 the

 most

 famous

 and

 beautiful

 cities

 in

 the

 world

.

 Paris

 has

 been

 the

 center

 of

 politics

,

 economy

,

 culture

,

 and

 art

 for

 centuries

 and

 has

 a

 rich

 history

.

 The

 city

 is

 known

 for

 its

 stunning

 architecture

,

 world

-class

 museums

,

 fashion

,

 cuisine

,

 and

 romantic

 atmosphere

.

 It

 is

 a

 popular

 destination

 for

 tourists

 and

 a

 great

 place

 to

 live

 and

 work

.


Paris

 has

 a

 total

 area

 of

 

105

.

4

 square

 kilometers

 and

 a

 population

 of

 approximately

 

2

.

1

 million

 people

.

 The

 city

 is

 divided

 into

 

20

 arr

ond

isse

ments

 (

district

s

),

 each

 with

 its



Prompt: The future of AI is
Generated text: 

 in

 conversation

,

 not

 code




While

 the

 current

 state

 of

 artificial

 intelligence

 is

 all

 about

 code

,

 the

 future

 of

 AI

 is

 in

 conversation

,

 not

 code

.

 This

 is

 because

 the

 next

 generation

 of

 AI

 will

 rely

 on

 natural

 language

 processing

,

 machine

 learning

,

 and

 other

 technologies

 that

 enable

 computers

 to

 understand

 and

 engage

 with

 humans

 in

 a

 more

 intuitive

 way

.


In

 the

 future

,

 AI

 will

 be

 about

 building

 systems

 that

 can

 learn

 from

 humans

,

 adapt

 to

 new

 situations

,

 and

 interact

 with

 us

 in

 a

 more

 convers

ational

 way

.

 This

 will

 be

 achieved

 through

 the

 use

 of

 advanced

 technologies

 such

 as

 natural

 language

 processing

,

 machine

 learning

,

 and

 computer

 vision

.


The

 current

 state

 of

 AI

 is

 largely

 focused




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Máiréad. I am an Irish woman who loves to cook and bake. In fact, cooking and baking are more than just hobbies for me - they are passions! I love experimenting with new recipes and ingredients, and I'm always on the lookout for new ideas to add to my repertoire.
I come from a family of passionate cooks, and I grew up watching my mother and grandmother prepare meals for our family. They taught me the value of using fresh, seasonal ingredients and the importance of taking the time to prepare a meal with love.
I'm particularly fond of baking, and I've spent years perfecting my skills in the

Prompt: The capital of France is
Generated text:  in the midst of a revolution, but it's not the one you might be thinking of. Instead of fighting for political freedom, the revolution in Paris is all about food.
It's been dubbed the "Food Revolution," and it's all about the power of food to bring people together and create positive change. At its core is a

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Cl

audio

,

 and

 I

'm

 a

 public

 health

 specialist

.

 I

'm

 excited

 to

 share

 my

 experiences

 and

 insights

 about

 various

 public

 health

 issues

,

 as

 well

 as

 my

 passion

 for

 global

 health

 and

 social

 justice

.


As

 a

 public

 health

 specialist

,

 I

 have

 worked

 on

 a

 range

 of

 issues

,

 including

 infectious

 diseases

,

 maternal

 and

 child

 health

,

 and

 health

 systems

 strengthening

.

 My

 experience

 has

 taken

 me

 to

 various

 parts

 of

 the

 world

,

 from

 Latin

 America

 to

 Africa

,

 and

 from

 rural

 villages

 to

 urban

 cities

.


I

 am

 passionate

 about

 promoting

 health

 equity

 and

 addressing

 the

 social

 determin

ants

 of

 health

.

 I

 believe

 that

 everyone

 deserves

 access

 to

 quality

 healthcare

,

 regardless

 of

 their

 background

 or

 socioeconomic

 status

.

 I



Prompt: The capital of France is
Generated text: 

 in

 a

 state

 of

 emergency

 as

 protests

 and

 riots

 erupt

 in

 response

 to

 the

 government

's

 proposed

 pension

 reforms

.

 The

 clashes

 between

 protesters

 and

 police

 have

 been

 intense

,

 with

 tear

 gas

 and

 water

 cannons

 being

 used

 to

 dis

perse

 the

 crowds

.


The

 proposed

 reforms

 aim

 to

 raise

 the

 retirement

 age

 from

 

62

 to

 

64

,

 which

 has

 sparked

 widespread

 opposition

 from

 trade

 unions

 and

 pension

ers

.

 The

 government

 has

 argued

 that

 the

 changes

 are

 necessary

 to

 ensure

 the

 long

-term

 sustainability

 of

 the

 pension

 system

,

 but

 the

 protesters

 see

 it

 as

 a

 betrayal

 of

 their

 rights

 and

 a

 threat

 to

 their

 livelihood

s

.


As

 the

 situation

 continues

 to

 escalate

,

 the

 government

 has

 deployed

 hundreds

 of

 riot

 police

 to

 maintain



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 it

 also

 comes

 with

 a

 lot

 of

 challenges

.

 We

 will

 have

 to

 face

 the

 consequences

 of

 our

 own

 success

 and

 think

 about

 the

 unintended

 effects

 of

 creating

 super

 intelligent

 machines

.

 What

 do

 you

 think

 the

 future

 of

 AI

 will

 bring

?


The

 future

 of

 AI

 is

 promising

,

 with

 significant

 advancements

 expected

 in

 areas

 such

 as

 natural

 language

 processing

,

 computer

 vision

,

 and

 reinforcement

 learning

.

 However

,

 it

 also

 raises

 concerns

 about

 job

 displacement

,

 bias

 in

 decision

-making

,

 and

 the

 potential

 for

 AI

 to

 be

 used

 for

 malicious

 purposes

.

 To

 address

 these

 challenges

,

 it

's

 essential

 to

 develop

 and

 implement

 robust

 regulations

,

 invest

 in

 education

 and

 re

-sk

illing

 programs

,

 and

 ensure

 that

 AI

 systems




In [6]:
llm.shutdown()