# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-24 16:29:10 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.11it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:02<00:02,  1.02s/it]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:03<00:01,  1.02s/it]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.28it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.16it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Jagdeep Dhillon, and I am a Richmond upon Thames Councillor.
I am a proud resident of Twickenham and I have been serving as a councillor since 2010. I am a member of the Liberal Democrats and have always been passionate about working for my community.
I have a strong track record of working to improve the local environment, transport links and education provision. I believe that as a councillor, it is my duty to listen to the views of local residents and to use my skills and experience to make a positive difference to our community.
In 2014, I was appointed as the Deputy Leader of the Liberal
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States, and is the commander-in-chief of the United States Armed Forces. The president is elected through the Electoral College system by the people of the United States, and serves a four-year term. The president has the power to veto leg

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Hannah

 and

 I

’m

 a

 

20

-year

-old

 university

 student

,

 living

 in

 London

.

 I

’m

 an

 avid

 book

worm

,

 love

 trying

 out

 new

 coffee

 spots

,

 and

 enjoy

 long

 walks

 in

 the

 countryside

.

 When

 I

’m

 not

 studying

,

 you

 can

 find

 me

 trying

 out

 new

 recipes

 in

 the

 kitchen

,

 practicing

 yoga

,

 or

 planning

 my

 next

 adventure

.

 I

’m

 excited

 to

 share

 my

 passion

 for

 travel

,

 food

,

 and

 culture

 with

 you

,

 and

 hope

 you

 enjoy

 reading

 about

 my

 experiences

!



Latest

 posts

 by

 Hannah




How

 to

 Pack

 for

 a

 Gap

 Year

:

 Tips

 and

 Tricks




5

 Delicious

 and

 Easy

-to

-M

ake

 Breakfast

 Recipes

 for

 a

 Healthy

 Start




7

 Cultural

 Differences

 to

 Understand

 When



Prompt: The capital of France is
Generated text: 

 Paris

.

 Paris

 is

 a

 city

 located

 in

 the

 north

-central

 part

 of

 France

.

 It

 is

 the

 country

's

 largest

 city

 and

 is

 the

 capital

 of

 the

 country

.

 The

 population

 of

 Paris

 is

 about

 

2

.

2

 million

 people

.

 Paris

 is

 known

 for

 its

 many

 famous

 landmarks

 and

 tourist

 attractions

,

 such

 as

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 and

 Notre

-D

ame

 Cathedral

.

 The

 city

 is

 also

 known

 for

 its

 fashion

,

 cuisine

,

 and

 art

.


The

 city

 of

 Paris

 is

 divided

 into

 

20

 arr

ond

isse

ments

 or

 districts

.

 Each

 district

 has

 its

 own

 unique

 character

 and

 charm

.

 The

 arr

ond

isse

ments

 are

 numbered

 from

 

1

 to

 

20

,



Prompt: The future of AI is
Generated text: 

 being

 shaped

 by

 a

 rapidly

 evolving

 landscape

 of

 technological

 advancements

,

 societal

 needs

,

 and

 economic

 pressures

.

 As

 AI

 continues

 to

 advance

,

 we

 can

 expect

 to

 see

 significant

 transformations

 in

 various

 industries

 and

 aspects

 of

 our

 lives

.

 Here

 are

 some

 potential

 future

 developments

 that

 are

 likely

 to

 impact

 the

 world

 of

 AI

:


1

.

 **

Increased

 adoption

 in

 healthcare

**:

 AI

 is

 expected

 to

 play

 a

 crucial

 role

 in

 personalized

 medicine

,

 disease

 diagnosis

,

 and

 treatment

 planning

.

 Advances

 in

 medical

 imaging

,

 gen

omics

,

 and

 machine

 learning

 will

 enable

 AI

 to

 make

 more

 accurate

 predictions

 and

 recommendations

,

 leading

 to

 better

 patient

 outcomes

.


2

.

 **

Aut

onomous

 vehicles

 and

 transportation

 systems

**:

 Self

-driving

 cars

,

 drones

,

 and




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Aaron. I am a 22-year-old musician and music producer from the United States. I have been playing music for over 10 years and have recently started to focus more on producing electronic music. I am excited to join this community and share my music with you all!
Hey, nice to meet you! Welcome to the community! I'm a music producer and DJ, and I'm always happy to meet other like-minded people. What kind of music do you produce, and what's your favorite genre to work with? Do you have a favorite software or hardware that you like to use?

Also, what's your musical background like?

Prompt: The capital of France is
Generated text:  the most popular tourist destination in the world, attracting over 23 million visitors each year. With its rich history, art, fashion, and cuisine, Paris is a city that offers something for everyone. Here are some of the top things to do in Paris:
1. Visit the Eiffel Tower: The iconic Eiffel Tower is a must-visit attra

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Ari

elle

.

 I

'm

 a

 

25

 year

 old

 artist

 who

 loves

 to

 paint

,

 draw

,

 and

 write

.

 I

'm

 originally

 from

 New

 Jersey

,

 but

 I

've

 recently

 moved

 to

 Florida

 where

 I

'm

 loving

 the

 sunshine

 and

 warm

 weather

.

 I

'm

 a

 bit

 of

 a

 goof

ball

 and

 love

 to

 make

 people

 laugh

,

 but

 I

'm

 also

 a

 bit

 of

 a

 intro

vert

 and

 enjoy

 spending

 time

 by

 myself

,

 whether

 that

's

 reading

 a

 book

,

 taking

 a

 nap

,

 or

 just

 sitting

 in

 silence

.


I

'm

 also

 a

 bit

 of

 a

 hopeless

 romantic

 and

 love

 anything

 that

's

 related

 to

 love

,

 relationships

,

 or

 emotions

.

 I

'm

 a

 bit

 of

 a

 dream

er

 and

 love

 to



Prompt: The capital of France is
Generated text: 

 a

 city

 of

 beauty

 and

 romance

.

 The

 E

iff

el

 Tower

,

 a

 symbol

 of

 French

 culture

,

 stands

 tall

 and

 proud

.

 Visitors

 can

 climb

 to

 the

 top

 of

 the

 tower

 for

 a

 panoramic

 view

 of

 the

 city

.

 The

 Lou

vre

 Museum

 is

 another

 popular

 destination

,

 featuring

 an

 impressive

 collection

 of

 art

 and

 artifacts

 from

 around

 the

 world

.


The

 Se

ine

 River

 runs

 through

 the

 heart

 of

 the

 city

,

 offering

 a

 picturesque

 view

 of

 the

 city

's

 architecture

.

 The

 Notre

 Dame

 Cathedral

 is

 a

 beautiful

 example

 of

 Gothic

 architecture

,

 and

 the

 Saint

e

-Ch

ap

elle

 is

 famous

 for

 its

 stunning

 stained

-g

lass

 windows

.

 The

 Ch

amps

-

É

lys

ées

 is

 a

 famous

 shopping

 street

 lined

 with



Prompt: The future of AI is
Generated text: 

 not

 a

 machine

 that

 can

 think

 and

 act

 like

 a

 human

,

 but

 a

 system

 that

 can

 learn

 and

 adapt

 to

 its

 environment

.


In

 the

 future

,

 AI

 will

 be

 designed

 to

 be

 more

 transparent

,

 explain

able

 and

 accountable

,

 with

 a

 focus

 on

 human

-centered

 design

 that

 priorit

izes

 the

 well

-being

 and

 safety

 of

 all

 individuals

 involved

.


Art

ificial

 Intelligence

 will

 continue

 to

 play

 a

 critical

 role

 in

 addressing

 some

 of

 the

 world

's

 most

 pressing

 challenges

,

 such

 as

 climate

 change

,

 inequality

 and

 access

 to

 healthcare

.


The

 future

 of

 AI

 is

 not

 just

 about

 technology

,

 but

 about

 the

 values

 and

 principles

 that

 guide

 its

 development

 and

 deployment

.


As

 AI

 becomes

 increasingly

 integrated

 into

 our

 lives

,

 it

's




In [6]:
llm.shutdown()