# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-29 07:20:37 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.47it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.34it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.25it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.62it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.50it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Desere and I'm excited to be joining the team at the Marion County Humane Society. I've always been an animal lover and I'm looking forward to helping the animals in our community. In my free time, you can find me volunteering at a local animal shelter or spending time with my own pets, a cat named Loki and a dog named Duke.
As an adoption counselor, I'll be helping match the right pet with the right owner. Whether you're looking for a cuddly cat or a playful dog, I'll be here to answer all your questions and help you find your new best friend.
Please don't hesitate to reach
Prompt: The president of the United States is
Generated text:  not only the head of state, but also the head of government. He is the commander-in-chief of the armed forces and is responsible for making sure that the government functions properly. He is also responsible for making sure that the laws are enforced.
The president is also the chief diplomat of the country, res

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Ruby

!

 I

 am

 a

 

4

-year

-old

 sp

ayed

 female

 cal

ico

 cat

 with

 bright

 green

 eyes

.

 I

 am

 looking

 for

 a

 forever

 home

 where

 I

 will

 be

 loved

 and

 pam

pered

.

 I

 enjoy

 playing

 with

 toys

,

 sn

uggling

,

 and

 getting

 treats

.

 I

 can

 be

 a

 bit

 shy

 at

 first

,

 but

 once

 I

 get

 to

 know

 you

,

 I

 become

 very

 affection

ate

.

 I

 weigh

 about

 

10

 pounds

 and

 have

 a

 medium

-length

 coat

.

 I

 am

 up

-to

-date

 on

 my

 vaccinations

 and

 micro

ch

ipped

 for

 my

 safety

.

 If

 you

 are

 looking

 for

 a

 loyal

 and

 loving

 companion

,

 I

 may

 be

 the

 perfect

 cat

 for

 you

!


I

 will

 get

 along

 great

 with

 a



Prompt: The capital of France is
Generated text: 

 under

 attack

.

 Paris

 is

 a

 city

 that

 never

 sleeps

,

 but

 on

 the

 night

 of

 November

 

13

th

,

 the

 city

 is

 gri

pped

 by

 fear

 and

 chaos

.

 The

 terrorist

 attacks

 on

 Paris

 have

 left

 a

 trail

 of

 death

 and

 destruction

 in

 their

 wake

.


The

 news

 is

 spreading

 like

 wildfire

,

 and

 the

 world

 is

 watching

 in

 horror

 as

 the

 images

 of

 the

 attacks

 are

 broadcast

 live

 on

 television

.

 The

 E

iff

el

 Tower

,

 a

 symbol

 of

 French

 culture

 and

 engineering

,

 stands

 tall

 and

 proud

,

 but

 its

 usual

 beauty

 is

 overshadow

ed

 by

 the

 darkness

 of

 the

 events

 unfolding

 below

.


The

 attacks

 are

 a

 complex

 web

 of

 shootings

 and

 bombings

 that

 target

 multiple

 locations

,

 including

 the

 B

ata



Prompt: The future of AI is
Generated text: 

 not

 what

 you

 think




by

 Beth

any

 Mayer

,

 CEO

 of

 Ary

aka

 Networks




Art

ificial

 Intelligence

 (

AI

)

 is

 a

 buzz

word

 that

 has

 captured

 the

 imagination

 of

 the

 world

.

 While

 many

 experts

 believe

 that

 AI

 will

 augment

 our

 lives

 with

 its

 unparalleled

 abilities

,

 I

 believe

 there

 are

 miscon

ceptions

 about

 what

 the

 future

 of

 AI

 holds

.

 Here

 are

 a

 few

 of

 them

.


Mis

con

ception

 

1

:

 AI

 will

 replace

 human

 workers




This

 is

 the

 most

 common

 misconception

 about

 AI

.

 Many

 believe

 that

 AI

 will

 replace

 human

 workers

,

 especially

 those

 in

 jobs

 that

 involve

 repetitive

 tasks

.

 While

 AI

 is

 capable

 of

 autom

ating

 many

 tasks

,

 it

 will

 not

 replace

 human

 workers

.

 AI




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Micaiah and I'm a junior at Cherry Hill High School East. This is my third year on the East Side Online. I'm currently the Co-Editor-in-Chief along with Rachel Sacks.
In addition to working on the ESO, I'm also a member of the East Side Online's podcast team. I'm passionate about storytelling and enjoy hearing the diverse perspectives of my fellow students and staff members.
Outside of school, I enjoy listening to music, watching movies, and spending time with my friends and family. My favorite TV shows include "The Office," "Parks and Recreation," and "Stranger Things." My

Prompt: The capital of France is
Generated text:  Paris, and I've never been there. I've seen pictures, read books, and watched movies about it. But I've never actually set foot in the City of Light.
I'm curious, have you ever been to Paris? If so, what was your favorite experience or memory from the trip?
Here are some questions to help spark your response:
* What were s

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Paula

.

 I

 am

 a

 proud

 member

 of

 the

 First

 Nations

 community

 and

 a

 committed

 advocate

 for

 Indigenous

 rights

 and

 reconciliation

.

 I

 have

 had

 the

 privilege

 of

 working

 in

 various

 roles

 that

 have

 allowed

 me

 to

 engage

 with

 Indigenous

 peoples

,

 share

 their

 stories

 and

 raise

 awareness

 about

 the

 challenges

 they

 face

.


My

 passion

 for

 Indigenous

 issues

 stems

 from

 my

 upbringing

 in

 the

 North

,

 where

 I

 was

 surrounded

 by

 the

 rich

 culture

 and

 traditions

 of

 my

 community

.

 As

 I

 grew

 older

,

 I

 realized

 that

 there

 was

 much

 work

 to

 be

 done

 to

 address

 the

 historical

 injust

ices

 and

 systemic

 barriers

 that

 have

 contributed

 to

 the

 ongoing

 disparities

 faced

 by

 Indigenous

 peoples

 in

 Canada

.


As

 a

 social

 entrepreneur

,

 I

 have

 had

 the



Prompt: The capital of France is
Generated text: 

 known

 for

 its

 incredible

 art

 museums

 and

 stunning

 architecture

,

 but

 there

 is

 more

 to

 Paris

 than

 just

 art

 and

 architecture

.

 The

 city

 is

 a

 food

 lover

's

 paradise

,

 and

 one

 of

 the

 most

 iconic

 French

 dishes

 is

 the

 cro

que

-m

ons

ieur

.


The

 cro

que

-m

ons

ieur

 is

 a

 grilled

 ham

 and

 cheese

 sandwich

 that

 is

 typically

 made

 with

 ham

,

 cheese

,

 and

 bé

ch

amel

 sauce

,

 which

 is

 a

 white

 sauce

 made

 from

 butter

,

 flour

,

 and

 milk

.

 It

 is

 often

 served

 as

 a

 snack

 or

 light

 meal

 and

 is

 a

 popular

 item

 on

 menus

 in

 Paris

ian

 cafes

 and

 bist

ros

.


To

 make

 a

 traditional

 cro

que

-m

ons

ieur

,

 you

 will

 need

 the



Prompt: The future of AI is
Generated text: 

 not

 just

 about

 developing

 intelligent

 machines

,

 but

 also

 about

 designing

 the

 society

 that

 will

 live

 alongside

 them

.

 We

 need

 to

 start

 thinking

 about

 what

 kind

 of

 world

 we

 want

 to

 create

,

 and

 what

 kind

 of

 relationship

 we

 want

 to

 have

 with

 AI

.


In

 

201

4

,

 Nick

 B

ost

rom

 and

 El

ie

zer

 Y

ud

k

ows

ky

 co

-authored

 a

 book

 called

 “

C

ognitive

 Bi

ases

 Pot

entially

 A

ffect

ing

 Judgment

 of

 Global

 Ris

ks

”,

 which

 laid

 out

 a

 framework

 for

 understanding

 the

 cognitive

 biases

 that

 affect

 our

 judgments

 of

 global

 risks

.


One

 of

 the

 most

 important

 concepts

 in

 the

 book

 is

 the

 idea

 of

 a

 “

c

ognitive

 bias

”.

 A

 cognitive

 bias

 is

 a

 systematic




In [6]:
llm.shutdown()