# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  5.49it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.66it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.31it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.17it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.32it/s]

  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:00<00:19,  1.14it/s]

  9%|▊         | 2/23 [00:01<00:10,  1.93it/s]

 13%|█▎        | 3/23 [00:01<00:07,  2.55it/s]

 17%|█▋        | 4/23 [00:01<00:06,  2.85it/s]

 22%|██▏       | 5/23 [00:01<00:05,  3.19it/s]

 26%|██▌       | 6/23 [00:02<00:05,  3.35it/s]

 30%|███       | 7/23 [00:02<00:04,  3.59it/s]

 35%|███▍      | 8/23 [00:02<00:04,  3.45it/s]

 39%|███▉      | 9/23 [00:02<00:03,  3.59it/s]

 43%|████▎     | 10/23 [00:03<00:03,  3.59it/s]

 48%|████▊     | 11/23 [00:03<00:03,  3.57it/s]

 52%|█████▏    | 12/23 [00:03<00:03,  3.37it/s]

 57%|█████▋    | 13/23 [00:04<00:03,  3.21it/s]

 61%|██████    | 14/23 [00:04<00:02,  3.30it/s]

 65%|██████▌   | 15/23 [00:04<00:02,  3.45it/s]

 70%|██████▉   | 16/23 [00:05<00:01,  3.59it/s]

 74%|███████▍  | 17/23 [00:05<00:01,  3.72it/s]

 78%|███████▊  | 18/23 [00:05<00:01,  3.80it/s]

 83%|████████▎ | 19/23 [00:05<00:01,  3.82it/s]

 87%|████████▋ | 20/23 [00:06<00:00,  3.86it/s]

 91%|█████████▏| 21/23 [00:06<00:00,  3.93it/s]

 96%|█████████▌| 22/23 [00:06<00:00,  3.94it/s]

100%|██████████| 23/23 [00:06<00:00,  3.94it/s]100%|██████████| 23/23 [00:06<00:00,  3.39it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Ameet. I am a Ph.D. student in the Department of Computer Science and Engineering at the Indian Institute of Technology (IIT) Hyderabad. My research area is Machine Learning and Deep Learning. I am currently working on the development of Deep Learning-based models for image and speech processing tasks. I am also interested in developing novel architectures and techniques for improving the efficiency and interpretability of Deep Learning models.
In my free time, I enjoy hiking, reading, and trying out new recipes in the kitchen. I am also a fan of learning new programming languages and exploring new tools and technologies.

Here are some of the technologies I am familiar with
Prompt: The president of the United States is
Generated text:  the most powerful person in the world, and there are many reasons why. Here are some of the key powers and responsibilities that contribute to the president's immense power:
1. Commander-in-Chief: The president

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Oliver

.

 I

 am

 a

 

8

-year

-old

 boy

,

 and

 I

 am

 a

 

3

rd

-grade

 student

.

 My

 favorite

 subjects

 are

 art

,

 music

,

 and

 P

.E

.

 I

 like

 to

 draw

 and

 paint

 and

 I

 also

 like

 to

 play

 soccer

 and

 ride

 my

 bike

.

 My

 favorite

 animal

 is

 the

 tiger

,

 and

 my

 favorite

 food

 is

 pizza

.

 I

 like

 to

 play

 video

 games

 and

 watch

 TV

 on

 the

 weekends

.

 I

 also

 like

 to

 go

 to

 the

 park

 and

 play

 with

 my

 friends

.

 I

 like

 to

 help

 my

 mom

 and

 dad

 around

 the

 house

 and

 take

 care

 of

 my

 little

 sister

.


Hi

,

 my

 name

 is

 Emma

.

 I

 am

 a

 

7

-year

-old

 girl

,

 and



Prompt: The capital of France is
Generated text: 

 famous

 for

 its

 history

,

 art

,

 fashion

,

 cuisine

 and

 wine

.

 These

 are

 just

 a

 few

 of

 the

 many

 reasons

 why

 Paris

 is

 one

 of

 the

 most

 visited

 cities

 in

 the

 world

.

 But

 for

 many

 travelers

,

 Paris

 is

 more

 than

 just

 a

 city

 –

 it

’s

 an

 experience

.

 Here

 are

 some

 of

 the

 top

 attractions

 and

 experiences

 to

 have

 in

 the

 City

 of

 Light

.


1

.

 Visit

 the

 E

iff

el

 Tower




The

 E

iff

el

 Tower

 is

 a

 must

-

see

 attraction

 in

 Paris

.

 This

 iconic

 iron

 lattice

 tower

 was

 built

 for

 the

 

188

9

 World

’s

 Fair

 and

 was

 initially

 intended

 to

 be

 a

 temporary

 structure

.

 Today

,

 it

 stands

 as

 the

 tallest

 building

 in

 Paris



Prompt: The future of AI is
Generated text: 

 in

 the

 cloud




As

 AI

 continues

 to

 transform

 various

 industries

 and

 aspects

 of

 our

 lives

,

 the

 cloud

 will

 play

 an

 increasingly

 important

 role

 in

 its

 development

 and

 deployment

.

 The

 cloud

 has

 enabled

 the

 widespread

 adoption

 of

 AI

 by

 providing

 scalable

,

 on

-demand

 access

 to

 computing

 resources

 and

 data

 storage

.

 This

 has

 reduced

 the

 barriers

 to

 entry

 for

 companies

 looking

 to

 implement

 AI

 solutions

,

 making

 it

 more

 accessible

 and

 affordable

.


The

 cloud

 also

 provides

 a

 number

 of

 benefits

 that

 are

 particularly

 well

-su

ited

 to

 AI

,

 such

 as

:


Sc

al

ability

:

 Cloud

 providers

 can

 quickly

 scale

 up

 or

 down

 to

 meet

 the

 changing

 needs

 of

 AI

 work

loads

,

 which

 can

 be

 comput

ationally

 intensive

 and

 require

 large

 amounts




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Cindy, and I am the mom of a 7-year-old boy who loves dinosaurs. His name is Jaxon, and he has a serious obsession with all things prehistoric. His room is a dinosaur wonderland, and I'm pretty sure he's going to grow up to be a paleontologist one day.
As a mom, I've tried to encourage his passion and provide him with as many educational and fun experiences as possible. Recently, we went on a dinosaur-themed vacation to the American Museum of Natural History in New York City. Jaxon was in awe of the massive dinosaur skeletons on display, and I was impressed by the

Prompt: The capital of France is
Generated text:  Paris. The French language is the official language of France. The country is famous for its art, history, and fashion. It has a rich cultural heritage and has been a major influence on Western civilization.
France is located in Western Europe, bordered by several countries including Belgium, Luxembourg, Germany, Switzerland, Italy,

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Sh

adia

 and

 I

 am

 a

 

3

rd

 year

 student

 at

 the

 University

 of

 Technology

,

 Jamaica

.

 I

 am

 studying

 Computer

 Science

 and

 I

 am

 very

 interested

 in

 learning

 about

 different

 programming

 languages

 and

 developing

 software

 applications

.


I

 am

 particularly

 interested

 in

 the

 use

 of

 technology

 to

 improve

 the

 lives

 of

 people

 in

 my

 community

.

 I

 have

 always

 been

 fascinated

 by

 how

 technology

 can

 be

 used

 to

 solve

 real

-world

 problems

 and

 make

 a

 positive

 impact

 on

 society

.


In

 my

 free

 time

,

 I

 enjoy

 reading

 about

 technology

 and

 learning

 new

 programming

 languages

.

 I

 also

 like

 to

 participate

 in

 coding

 challenges

 and

 hack

ath

ons

 to

 test

 my

 skills

 and

 learn

 from

 others

.



I

 am

 excited

 to

 be

 part

 of

 this



Prompt: The capital of France is
Generated text: 

 full

 of

 history

,

 culture

,

 and

 of

 course

,

 delicious

 food

!

 Whether

 you

’re

 looking

 to

 explore

 the

 iconic

 landmarks

,

 experience

 the

 city

’s

 vibrant

 nightlife

,

 or

 indulge

 in

 some

 French

 cuisine

,

 Paris

 is

 a

 must

-

visit

 destination

.


Here

 are

 some

 of

 the

 top

 things

 to

 do

 in

 Paris

:


Visit

 the

 E

iff

el

 Tower

:

 This

 iconic

 iron

 lattice

 tower

 is

 a

 must

-

visit

 attraction

 in

 Paris

.

 You

 can

 take

 the

 stairs

 or

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


Explore

 the

 Lou

vre

 Museum

:

 The

 world

’s

 largest

 art

 museum

 is

 home

 to

 some

 of

 the

 most

 famous

 paintings

 in

 history

,

 including

 the

 Mona

 Lisa

.

 Be

 sure

 to

 book



Prompt: The future of AI is
Generated text: 

 not

 just

 about

 intelligent

 machines

,

 but

 about

 how

 humans

 will

 interact

 with

 and

 be

 transformed

 by

 these

 technologies

.

 The

 goal

 of

 this

 book

 is

 to

 explore

 the

 social

 implications

 of

 artificial

 intelligence

,

 and

 to

 provide

 a

 framework

 for

 thinking

 about

 the

 future

 of

 human

-A

I

 interaction

.


The

 book

 is

 organized

 around

 five

 key

 themes

:


1

.

 **

Human

-C

entered

 AI

**:

 How

 can

 we

 design

 AI

 systems

 that

 are

 centered

 on

 human

 values

 and

 needs

?


2

.

 **

AI

 and

 Society

**:

 What

 are

 the

 implications

 of

 AI

 for

 social

 structures

,

 institutions

,

 and

 relationships

?


3

.

 **

Human

-A

I

 Interaction

**:

 How

 will

 humans

 interact

 with

 AI

 systems

,

 and

 how

 will

 these

 interactions

 shape

 our




In [6]:
llm.shutdown()