# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  4.80it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.62it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.29it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.16it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.30it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Aiden. I'm a 3 year old Golden Retriever. My mom says I'm a good boy. I love to play fetch and go on walks. Sometimes I get a little too excited and jump up on people, but my mom says it's okay as long as I'm not being too rough. I'm still learning.
My favorite thing to do is play tug of war with my favorite toy, a squeaky chicken. My mom says I have to share with my sister, but I don't want to share. I like to play with my toys by myself.
Sometimes my mom takes me to the park and I
Prompt: The president of the United States is
Generated text:  invited to speak at an African-American university on the anniversary of Martin Luther King Jr.'s birthday. His speech should be an historical context of the life of Dr. King, discuss some of his major contributions, and reveal some of his personal thoughts and emotions.
The following is a speech given by President Barack Obama at Morehouse College in Atlanta, Georgia, on January 16, 2011, on the occasi

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Amy

,

 and

 I

’m

 an

 avid

 traveler

 and

 food

ie

 from

 Australia

.

 I

’m

 a

 bit

 of

 a

 curious

 soul

,

 always

 looking

 for

 the

 next

 great

 adventure

,

 trying

 new

 foods

,

 and

 soaking

 up

 the

 local

 culture

 wherever

 I

 go

.

 I

 love

 sharing

 my

 travel

 stories

 and

 recipes

 with

 others

,

 so

 they

 can

 experience

 the

 beauty

 and

 flavors

 of

 the

 world

 through

 my

 eyes

.


My

 travel

 style

 is

 all

 about

 imm

ers

ing

 myself

 in

 the

 local

 way

 of

 life

.

 I

 love

 staying

 in

 boutique

 hotels

,

 trying

 street

 food

,

 exploring

 hidden

 gems

,

 and

 meeting

 the

 people

 who

 make

 a

 place

 truly

 special

.

 Whether

 it

’s

 trek

king

 through

 the

 Himal

ayas

,

 island

-h

opping

 in



Prompt: The capital of France is
Generated text: 

 known

 for

 its

 stunning

 architecture

,

 famous

 museums

,

 and

 rich

 history

.

 Paris

,

 the

 City

 of

 Light

,

 has

 been

 a

 hub

 of

 art

,

 literature

,

 and

 science

 for

 centuries

.

 It

 is

 a

 city

 of

 romance

,

 with

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

,

 and

 the

 Ch

amps

-

É

lys

ées

 attracting

 millions

 of

 visitors

 each

 year

.


The

 city

 is

 also

 famous

 for

 its

 cuisine

,

 fashion

,

 and

 wine

.

 From

 the

 classic

 Cro

que

 Mons

ieur

 to

 the

 rich

 flavors

 of

 Bou

ill

ab

ais

se

,

 Paris

ian

 food

 is

 a

 culinary

 experience

 like

 no

 other

.

 And

 of

 course

,

 the

 fashion

 capital

 of

 the

 world

 is

 home

 to

 some

 of

 the

 most

 iconic



Prompt: The future of AI is
Generated text: 

 here

,

 and

 it

's

 happening

 now




The

 year

 

202

3

 is

 shaping

 up

 to

 be

 a

 pivotal

 year

 for

 artificial

 intelligence

 (

AI

).

 As

 AI

 continues

 to

 transform

 industries

 and

 revolution

ize

 the

 way

 we

 live

,

 we

 are

 witnessing

 the

 convergence

 of

 multiple

 technological

 advancements

 that

 are

 bringing

 AI

 into

 the

 mainstream

.

 Here

 are

 just

 a

 few

 areas

 where

 AI

 is

 making

 a

 significant

 impact

:


1

.

 

 Chat

bots

 and

 Virtual

 Assist

ants

:

 Chat

bots

 and

 virtual

 assistants

 are

 becoming

 increasingly

 sophisticated

,

 enabling

 businesses

 to

 provide

 personalized

 customer

 service

,

 automate

 support

 tasks

,

 and

 enhance

 the

 overall

 customer

 experience

.


2

.

 

 Aug

mented

 Reality

 (

AR

)

 and

 Virtual

 Reality

 (

VR

):

 AR

 and




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Chris and I am an avid gamer and tech enthusiast. I have been playing games for over 20 years and have a deep appreciation for the industry. I have written for various gaming websites and publications, and I am excited to bring my knowledge and passion to this blog.
When I am not gaming, I enjoy watching TV shows and movies, playing sports, and spending time with my family and friends. I am a bit of a geek at heart, and I love staying up to date on the latest technology and trends.
My goal with this blog is to provide honest and informative reviews of the latest games, hardware, and technology. I want

Prompt: The capital of France is
Generated text:  Paris, which is home to the famous Louvre Museum, the Eiffel Tower, and the Notre Dame Cathedral. However, there is another city in France that is equally beautiful and worth visiting – Lyon.
Lyon is the third-largest city in France and is known for its rich history, cultural heritage, and delic

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 E

.L

.

 and

 I

'm

 a

 

14

-year

-old

 anime

 fan

.

 I

've

 been

 watching

 anime

 for

 about

 

6

 years

 now

,

 and

 I

've

 seen

 over

 

200

 different

 series

.

 My

 favorite

 anime

 is

 One

 Punch

 Man

.

 I

 love

 the

 humor

 and

 the

 action

 in

 it

.

 I

 also

 enjoy

 the

 superhero

 aspect

 of

 it

.

 I

 think

 it

's

 really

 cool

 how

 S

ait

ama

 can

 defeat

 any

 opponent

 with

 just

 one

 punch

.

 My

 friends

 and

 I

 like

 to

 make

 fun

 of

 the

 other

 characters

 and

 their

 over

powered

 abilities

.

 It

's

 a

 lot

 of

 fun

 to

 watch

 and

 laugh

 with

 friends

.

 Do

 you

 like

 anime

 too

?

 Have

 you

 seen

 One

 Punch

 Man

?


Hey

 E



Prompt: The capital of France is
Generated text: 

 Paris

,

 and

 there

 are

 five

 overseas

 departments

:

 French

 Gu

iana

,

 Gu

adel

ou

pe

,

 Martin

ique

,

 Ré

union

,

 and

 May

otte

.

 The

 largest

 city

 is

 Paris

,

 with

 a

 population

 of

 

2

.

1

 million

 people

.

 The

 French

 language

 is

 the

 official

 language

,

 but

 many

 other

 languages

 are

 also

 spoken

,

 including

 English

.


France

 is

 a

 popular

 tourist

 destination

,

 known

 for

 its

 stunning

 architecture

,

 art

 museums

,

 fashion

,

 and

 cuisine

.

 The

 most

 popular

 tourist

 destinations

 in

 France

 are

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 Notre

 Dame

 Cathedral

,

 the

 Palace

 of

 Vers

ailles

,

 and

 the

 French

 Riv

iera

.


France

 has

 a

 diverse

 economy

,

 with

 major



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 it

 also

 comes

 with

 risks

 and

 challenges

.


The

 rapid

 advancement

 of

 Artificial

 Intelligence

 (

AI

)

 has

 the

 potential

 to

 bring

 about

 numerous

 benefits

,

 including

 increased

 efficiency

,

 improved

 decision

-making

,

 and

 enhanced

 productivity

.

 However

,

 it

 also

 raises

 concerns

 about

 job

 displacement

,

 bias

,

 and

 the

 potential

 for

 AI

 to

 be

 used

 in

 malicious

 ways

.


To

 mitigate

 these

 risks

,

 experts

 are

 working

 to

 develop

 more

 transparent

 and

 explain

able

 AI

 systems

,

 as

 well

 as

 to

 establish

 guidelines

 for

 the

 responsible

 development

 and

 use

 of

 AI

.

 This

 includes

 developing

 AI

 that

 is

 aligned

 with

 human

 values

,

 and

 ensuring

 that

 AI

 is

 designed

 and

 used

 in

 ways

 that

 prioritize

 human

 well

-being

 and

 safety

.


One




In [6]:
llm.shutdown()