# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.18it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.27it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.85it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.47it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.46it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:36,  1.65s/it]

  9%|▊         | 2/23 [00:02<00:19,  1.09it/s]

 13%|█▎        | 3/23 [00:02<00:12,  1.60it/s]

 17%|█▋        | 4/23 [00:02<00:09,  2.08it/s]

 22%|██▏       | 5/23 [00:02<00:07,  2.44it/s]

 26%|██▌       | 6/23 [00:03<00:06,  2.66it/s]

 30%|███       | 7/23 [00:03<00:05,  2.79it/s]

 35%|███▍      | 8/23 [00:03<00:05,  2.89it/s]

 39%|███▉      | 9/23 [00:04<00:04,  2.91it/s]

 43%|████▎     | 10/23 [00:04<00:04,  2.95it/s]

 48%|████▊     | 11/23 [00:04<00:04,  3.00it/s]

 52%|█████▏    | 12/23 [00:05<00:03,  2.80it/s]

 57%|█████▋    | 13/23 [00:05<00:03,  2.89it/s]

 61%|██████    | 14/23 [00:05<00:02,  3.00it/s]

 65%|██████▌   | 15/23 [00:06<00:02,  3.10it/s]

 70%|██████▉   | 16/23 [00:06<00:02,  3.17it/s]

 74%|███████▍  | 17/23 [00:06<00:01,  3.20it/s]

 78%|███████▊  | 18/23 [00:07<00:01,  3.38it/s]

 83%|████████▎ | 19/23 [00:07<00:01,  3.51it/s]

 87%|████████▋ | 20/23 [00:07<00:00,  3.49it/s]

 91%|█████████▏| 21/23 [00:07<00:00,  3.51it/s]

 96%|█████████▌| 22/23 [00:08<00:00,  3.49it/s]

100%|██████████| 23/23 [00:08<00:00,  3.52it/s]100%|██████████| 23/23 [00:08<00:00,  2.73it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Sarah and I am a writer and a personal development enthusiast. I love to share inspiring stories and helpful tips to help others live their best lives.
I'm passionate about writing about spirituality, personal growth, mindfulness, and self-care, and I'm excited to be a part of this community.
I believe that everyone has a unique story to tell and that we all have the power to create the life we want. I'm here to support and inspire you on your own personal journey.
I look forward to connecting with you and sharing my thoughts and experiences with you.
Thanks for visiting my profile. I'm excited to get to know you and start
Prompt: The president of the United States is
Generated text:  not only the head of state, but also the head of the executive branch. The president is elected to a four-year term and serves as the commander-in-chief of the armed forces. The president also has the power to propose legislation, which is then sent to Congress f

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Grace

 and

 I

 am

 a

 

17

 year

 old

 senior

 at

 Mission

 College

 Pre

par

atory

 High

 School

.

 I

 am

 a

 proud

 member

 of

 the

 Class

 of

 

202

4

 and

 a

 proud

 Mustang

!

 I

 am

 passionate

 about

 my

 education

 and

 my

 community

.

 I

 am

 involved

 in

 the

 school

 choir

,

 student

 council

,

 and

 volunteer

 work

 through

 my

 church

 and

 school

.

 I

 am

 excited

 to

 be

 a

 part

 of

 the

 San

 Luis

 Ob

is

po

 Tribune

's

 

202

4

 Scholar

-A

th

lete

 and

 Student

 of

 the

 Year

 award

.

 I

 am

 honored

 to

 be

 recognized

 for

 my

 hard

 work

 and

 dedication

 to

 my

 studies

 and

 my

 community

.


Hello

,

 my

 name

 is

 Jason

 and

 I

 am

 a

 senior

 at

 R



Prompt: The capital of France is
Generated text: 

 a

 beautiful

 and

 romantic

 city

 that

 attracts

 millions

 of

 tourists

 every

 year

.

 From

 the

 iconic

 E

iff

el

 Tower

 to

 the

 art

 museums

 and

 charming

 streets

,

 Paris

 has

 something

 to

 offer

 for

 everyone

.


But

,

 did

 you

 know

 that

 Paris

 has

 a

 lot

 more

 to

 offer

 than

 just

 its

 popular

 tourist

 attractions

?

 Here

 are

 some

 of

 the

 lesser

-known

 facts

 and

 tips

 about

 Paris

 that

 you

 might

 find

 interesting

:


1

.

 Paris

 has

 a

 secret

 underground

 tunnel

 system




The

 Cata

com

bs

 of

 Paris

,

 a

 network

 of

 underground

 tunnels

 and

 cavern

s

,

 stretches

 for

 over

 

150

 miles

.

 It

 was

 created

 in

 the

 

13

th

 century

 and

 contains

 the

 remains

 of

 millions

 of

 Paris

ians

.

 Visitors

 can



Prompt: The future of AI is
Generated text: 

 uncertain

 and

 likely

 to

 be

 shaped

 by

 a

 complex

 inter

play

 of

 technological

,

 social

,

 economic

,

 and

 political

 factors

.


Art

ificial

 intelligence

 (

AI

)

 has

 the

 potential

 to

 revolution

ize

 numerous

 industries

,

 but

 its

 development

 and

 deployment

 also

 raise

 a

 multitude

 of

 ethical

,

 security

,

 and

 societal

 concerns

.


Reg

ulation

 and

 governance

 of

 AI

 will

 be

 crucial

 to

 address

 these

 challenges

 and

 ensure

 that

 the

 benefits

 of

 AI

 are

 equ

itably

 distributed

 and

 its

 negative

 consequences

 are

 mitig

ated

.


The

 future

 of

 AI

 is

 uncertain

 and

 likely

 to

 be

 shaped

 by

 a

 complex

 inter

play

 of

 technological

,

 social

,

 economic

,

 and

 political

 factors

.

 Here

 are

 some

 potential

 trends

 and

 challenges

 that

 may

 shape

 the

 future

 of

 AI




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Stella and I am a golden retriever. I love chasing balls and sticks, going for long walks, and snuggling with my humans. I am a bit of a goofball and love making my humans laugh with my silly antics. I also love making new doggy friends and going on adventures with my pack.
I am a bit of a foodie and love trying new treats and snacks. My favorite thing in the world is a good belly rub, and I will do just about anything for one. I am a bit of a scaredy-cat when it comes to loud noises, but with some treats and reassurance, I am back

Prompt: The capital of France is
Generated text:  one of the most visited cities in the world, known for its breathtaking beauty, rich history, and world-class art museums. Paris is a treasure trove of iconic landmarks, delectable cuisine, and endless entertainment options. With its stunning architecture, fashion, and romance, Paris is a must-visit destination for travelers from around the globe.
Paris, the City o

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Mack

enzie

!


I

’m

 a

 

19

-year

-old

 senior

 at

 Lee

 University

,

 studying

 to

 become

 a

 teacher

!

 I

 love

 learning

 and

 sharing

 my

 knowledge

 with

 others

.

 When

 I

’m

 not

 in

 the

 classroom

,

 you

 can

 find

 me

 playing

 my

 guitar

,

 singing

 in

 my

 college

 choir

,

 or

 trying

 out

 new

 recipes

 in

 the

 kitchen

.

 I

’m

 so

 excited

 to

 be

 here

 and

 get

 to

 know

 all

 of

 you

!

 Let

’s

 chat

 about

 anything

 you

’d

 like

 –

 school

,

 music

,

 food

,

 or

 just

 life

 in

 general

.

 I

’m

 all

 ears

 and

 ready

 for

 conversation

!


Hello

,

 my

 name

 is

 Mack

enzie

!


I

’m

 a

 

19

-year

-old

 senior

 at

 Lee

 University

,

 studying



Prompt: The capital of France is
Generated text: 

 a

 city

 of

 immense

 history

,

 beauty

,

 and

 culture

.

 With

 a

 rich

 past

 and

 a

 vibrant

 present

,

 Paris

 has

 capt

ivated

 travelers

 and

 artists

 for

 centuries

.

 From

 the

 iconic

 E

iff

el

 Tower

 to

 the

 world

-class

 museums

 like

 the

 Lou

vre

 and

 Or

say

,

 there

's

 no

 shortage

 of

 must

-

see

 attractions

 in

 Paris

.

 Take

 a

 stroll

 through

 the

 charming

 streets

 of

 Mont

mart

re

,

 visit

 the

 Palace

 of

 Vers

ailles

,

 or

 simply

 enjoy

 the

 city

's

 famous

 food

 and

 wine

.

 With

 so

 much

 to

 explore

,

 a

 visit

 to

 Paris

 is

 sure

 to

 be

 an

 unforgettable

 experience

.


Top

 Ex

periences

 in

 Paris




Explore

 the

 City

's

 Icon

ic

 Land

marks




Visit

 the

 E



Prompt: The future of AI is
Generated text: 

 AI




By

 

202

3

,

 AI

 will

 have

 the

 ability

 to

 reason

,

 to

 “

under

stand

”

 and

 to

 improve

 its

 own

 performance

.

 AI

 will

 be

 able

 to

 think

,

 learn

 and

 apply

 knowledge

 to

 real

-world

 problems

.

 The

 future

 of

 AI

 is

 AI

.


AI

 will

 become

 a

 self

-im

pro

ving

 system

,

 a

 meta

-A

I

 that

 will

 learn

 from

 data

 and

 experience

,

 and

 improve

 its

 own

 performance

 over

 time

.

 This

 will

 lead

 to

 a

 new

 era

 of

 innovation

 and

 progress

,

 where

 AI

 will

 become

 a

 tool

 for

 solving

 complex

 problems

 that

 were

 previously

 uns

olvable

.


The

 future

 of

 AI

 is

 AI

.

 It

 will

 become

 a

 self

-im

pro

ving

 system

 that

 will

 continue

 to




In [6]:
llm.shutdown()