# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.34it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.16it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.63it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.47it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.43it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:42,  1.93s/it]

  9%|▊         | 2/23 [00:02<00:23,  1.11s/it]

 13%|█▎        | 3/23 [00:02<00:15,  1.30it/s]

 17%|█▋        | 4/23 [00:03<00:11,  1.62it/s]

 22%|██▏       | 5/23 [00:03<00:09,  1.89it/s]

 26%|██▌       | 6/23 [00:03<00:08,  2.08it/s]

 30%|███       | 7/23 [00:04<00:07,  2.26it/s]

 35%|███▍      | 8/23 [00:04<00:06,  2.39it/s]

 39%|███▉      | 9/23 [00:05<00:06,  2.16it/s]

 43%|████▎     | 10/23 [00:05<00:06,  2.13it/s]

 48%|████▊     | 11/23 [00:06<00:05,  2.19it/s]

 52%|█████▏    | 12/23 [00:06<00:04,  2.30it/s]

 57%|█████▋    | 13/23 [00:06<00:04,  2.33it/s]

 61%|██████    | 14/23 [00:07<00:03,  2.41it/s]

 65%|██████▌   | 15/23 [00:07<00:03,  2.50it/s]

 70%|██████▉   | 16/23 [00:08<00:03,  1.78it/s]

 74%|███████▍  | 17/23 [00:09<00:03,  1.96it/s]

 78%|███████▊  | 18/23 [00:09<00:02,  2.17it/s]

 83%|████████▎ | 19/23 [00:09<00:01,  2.35it/s]

 87%|████████▋ | 20/23 [00:10<00:01,  2.48it/s]

 91%|█████████▏| 21/23 [00:10<00:00,  2.59it/s]

 96%|█████████▌| 22/23 [00:10<00:00,  2.70it/s]

100%|██████████| 23/23 [00:11<00:00,  2.75it/s]100%|██████████| 23/23 [00:11<00:00,  2.07it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Maxwell “Max” Hall and I am a Graphic Designer & Illustrator living in beautiful Lake Tahoe. My passion is creating unique and captivating art that pushes the boundaries of creativity. I have a background in graphic design, illustration, and fine art, which gives me a diverse skill set that I love to utilize in my work. When I'm not working, you can find me snowboarding, hiking, or just enjoying the beauty of the Sierra Nevada mountains.
Whether you're looking for a logo, branding, or graphic design, I would love to collaborate with you to bring your vision to life. My goal is to provide creative and effective solutions that
Prompt: The president of the United States is
Generated text:  the head of the executive branch of the federal government and the commander-in-chief of the armed forces. The president is directly elected by the people through the Electoral College system. The president serves a four-year term and is limited to two terms.
T

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Nick

y

 and

 I

'm

 a

 

45

-year

-old

 wife

,

 mother

 of

 

2

,

 and

 a

 little

 bit

 of

 a

 geek

.

 I

 love

 sci

-fi

,

 fantasy

,

 and

 horror

 movies

 and

 TV

 shows

.

 I

'm

 a

 bit

 of

 a

 book

worm

 and

 I

 enjoy

 reading

 all

 types

 of

 books

 but

 I

 have

 a

 special

 fond

ness

 for

 science

 fiction

 and

 fantasy

 novels

.

 I

'm

 a

 bit

 of

 a

 gamer

 too

 and

 enjoy

 playing

 video

 games

 on

 my

 Xbox

 One

 and

 playing

 tabletop

 games

 with

 my

 friends

.


I

 live

 in

 a

 small

 town

 in

 the

 middle

 of

 nowhere

 (

I

 like

 to

 call

 it

 "

the

 sticks

")

 and

 I

 love

 it

 here

.

 The

 peace

 and

 quiet

 is

 a

 nice



Prompt: The capital of France is
Generated text: 

 Paris

.

 The

 city

 is

 located

 in

 the

 north

-central

 part

 of

 the

 country

 and

 is

 situated

 along

 the

 Se

ine

 River

.

 Paris

 is

 known

 for

 its

 stunning

 architecture

,

 rich

 history

,

 and

 vibrant

 cultural

 scene

.

 The

 city

 is

 home

 to

 many

 famous

 landmarks

,

 including

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 and

 Notre

 Dame

 Cathedral

.


Paris

 is

 a

 popular

 tourist

 destination

,

 attracting

 millions

 of

 visitors

 each

 year

.

 The

 city

 offers

 a

 wide

 range

 of

 activities

 and

 attractions

,

 including

 museums

,

 art

 galleries

,

 historic

 sites

,

 and

 entertainment

 venues

.

 Visitors

 can

 explore

 the

 city

's

 charming

 neighborhoods

,

 such

 as

 Mont

mart

re

 and

 Le

 Mar

ais

,

 and

 enjoy

 the

 city

's

 famous



Prompt: The future of AI is
Generated text: 

 vast

 and

 rapidly

 evolving

.

 This

 book

 offers

 a

 comprehensive

 overview

 of

 the

 AI

 landscape

,

 including

 the

 current

 state

 of

 the

 field

,

 emerging

 trends

 and

 technologies

,

 and

 the

 societal

 implications

 of

 AI

.


The

 book

 explores

 various

 AI

 applications

 across

 different

 sectors

,

 including

 healthcare

,

 finance

,

 education

,

 transportation

,

 and

 more

.

 It

 also

 del

ves

 into

 the

 development

 of

 foundational

 AI

 technologies

,

 such

 as

 deep

 learning

,

 natural

 language

 processing

,

 and

 computer

 vision

.


The

 authors

 examine

 the

 opportunities

 and

 challenges

 presented

 by

 AI

,

 including

 job

 displacement

,

 bias

 and

 fairness

,

 and

 the

 need

 for

 AI

 literacy

.

 They

 also

 discuss

 the

 importance

 of

 responsible

 AI

 development

 and

 deployment

,

 as

 well

 as

 the

 need

 for

 regulatory




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Richard and I am a 44-year-old software developer living in the beautiful state of Utah. I grew up in Southern Utah and love the desert landscapes. In my free time, I enjoy hiking and exploring the outdoors, especially in the nearby national parks and monuments.
I am married to a wonderful woman, and we have two beautiful children. My wife is a talented artist, and we have been married for over 20 years. I have two wonderful kids, a boy, and a girl. My son is a bit of a tech whiz, just like his dad, and my daughter is a free spirit who loves art and music.


Prompt: The capital of France is
Generated text:  a must-visit destination for any traveler. Here are some reasons why you should visit Paris:
1. The Eiffel Tower: The Eiffel Tower is one of the most iconic landmarks in the world. It was built for the 1889 World's Fair and has become a symbol of Paris and France. You can take the elevator to the top for breathtaking views of the city.
2. 

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Amber

 and

 I

'm

 a

 

28

-year

-old

 Australian

 living

 in

 New

 York

 City

.

 I

'm

 a

 travel

 blogger

 and

 photographer

,

 which

 means

 I

 get

 to

 explore

 new

 destinations

 and

 experience

 different

 cultures

 while

 working

.


In

 my

 free

 time

,

 I

 love

 trying

 out

 new

 restaurants

,

 visiting

 local

 art

 galleries

,

 and

 taking

 long

 walks

 around

 the

 city

 to

 capture

 the

 perfect

 shot

 for

 my

 Instagram

 feed

.


As

 a

 travel

 blogger

,

 I

've

 had

 the

 opportunity

 to

 visit

 some

 incredible

 destinations

,

 from

 the

 stunning

 beaches

 of

 Bali

 to

 the

 vibrant

 streets

 of

 Tokyo

.

 But

 even

 with

 all

 the

 amazing

 places

 I

've

 been

 to

,

 there

's

 one

 place

 that

 will

 always

 hold

 a

 special

 spot

 in

 my

 heart



Prompt: The capital of France is
Generated text: 

 called

 Paris

.

 It

 is

 situated

 in

 the

 north

 of

 France

 and

 is

 home

 to

 the

 E

iff

el

 Tower

,

 one

 of

 the

 most

 famous

 landmarks

 in

 the

 world

.

 Paris

 has

 a

 population

 of

 around

 

2

.

1

 million

 people

,

 making

 it

 the

 largest

 city

 in

 France

.


Paris

 has

 a

 rich

 history

 dating

 back

 to

 the

 

3

rd

 century

 AD

 when

 it

 was

 a

 small

 settlement

.

 Over

 the

 centuries

,

 the

 city

 has

 grown

 and

 developed

,

 becoming

 a

 major

 center

 of

 culture

,

 art

,

 fashion

,

 and

 cuisine

.


Today

,

 Paris

 is

 a

 global

 hub

 for

 fashion

,

 art

,

 and

 culture

,

 and

 is

 home

 to

 some

 of

 the

 world

's

 most

 famous

 museums

,

 galleries

,



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 it

's

 also

 fraught

 with

 challenges

.

 Here

's

 a

 rundown

 of

 the

 good

,

 the

 bad

,

 and

 the

 ugly

 of

 AI

.


Art

ificial

 intelligence

 (

AI

)

 is

 all

 the

 rage

 these

 days

,

 and

 for

 good

 reason

.

 From

 virtual

 assistants

 to

 personalized

 product

 recommendations

,

 AI

 has

 already

 started

 to

 revolution

ize

 the

 way

 we

 live

 and

 work

.

 But

 as

 AI

 continues

 to

 advance

,

 it

's

 also

 raising

 a

 host

 of

 concerns

 and

 challenges

 that

 we

 can

't

 ignore

.


On

 the

 good

 side

,

 AI

 has

 the

 potential

 to

:


**

S

olve

 some

 of

 humanity

's

 biggest

 problems

**:

 AI

 can

 help

 us

 tackle

 complex

 issues

 like

 climate

 change

,

 disease

,

 and

 poverty

 by

 analyzing




In [6]:
llm.shutdown()