# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  4.91it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:00<00:01,  1.99it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.46it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.30it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.47it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:39,  1.78s/it]

  9%|▊         | 2/23 [00:02<00:20,  1.04it/s]

 13%|█▎        | 3/23 [00:02<00:13,  1.52it/s]

 17%|█▋        | 4/23 [00:02<00:09,  2.03it/s]

 22%|██▏       | 5/23 [00:02<00:07,  2.49it/s]

 26%|██▌       | 6/23 [00:03<00:05,  2.84it/s]

 30%|███       | 7/23 [00:03<00:04,  3.20it/s]

 35%|███▍      | 8/23 [00:03<00:04,  3.48it/s]

 39%|███▉      | 9/23 [00:03<00:03,  3.71it/s]

 43%|████▎     | 10/23 [00:04<00:03,  3.73it/s]

 48%|████▊     | 11/23 [00:04<00:03,  3.92it/s]

 52%|█████▏    | 12/23 [00:04<00:02,  4.02it/s]

 57%|█████▋    | 13/23 [00:04<00:02,  4.02it/s]

 61%|██████    | 14/23 [00:05<00:02,  4.06it/s]

 65%|██████▌   | 15/23 [00:05<00:01,  4.00it/s]

 70%|██████▉   | 16/23 [00:05<00:01,  4.06it/s]

 74%|███████▍  | 17/23 [00:05<00:01,  4.15it/s]

 78%|███████▊  | 18/23 [00:06<00:01,  4.19it/s]

 83%|████████▎ | 19/23 [00:06<00:00,  4.23it/s]

 87%|████████▋ | 20/23 [00:06<00:00,  4.23it/s]

 91%|█████████▏| 21/23 [00:06<00:00,  4.20it/s]

 96%|█████████▌| 22/23 [00:07<00:00,  4.22it/s]

100%|██████████| 23/23 [00:07<00:00,  4.19it/s]100%|██████████| 23/23 [00:07<00:00,  3.17it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Sarah and I'm a gardener, botanist, and naturalist with a passion for the intersection of plants and culture.
I have a degree in Botany and Horticulture, and I have spent many years working in greenhouses, gardens, and conservation organizations. I've also traveled extensively throughout the world, studying and learning about different plant species and their uses in various cultures.
I'm particularly interested in the ways that plants have been used by people throughout history, and how they continue to be used today. I'm fascinated by the diversity of plant species, and how they can be used in medicine, food, crafts, and other
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States, and is the highest-ranking official in the federal government. The president is directly elected by the people through the Electoral College, and serves a four-year term. The president is respon

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 A

mand

ine

 L

ef

eb

vre

,

 and

 I

 am

 a

 French

 artist

 living

 in

 London

.

 I

 was

 born

 in

 

197

6

 in

 the

 beautiful

 town

 of

 Rou

en

 in

 Norm

andy

,

 France

.

 My

 artistic

 background

 is

 self

-t

a

ught

,

 as

 I

 have

 always

 been

 driven

 by

 a

 strong

 desire

 to

 express

 myself

 through

 art

.

 I

 have

 worked

 as

 a

 visual

 artist

 since

 

200

1

,

 and

 my

 style

 has

 evolved

 over

 the

 years

,

 influenced

 by

 a

 range

 of

 different

 techniques

,

 including

 painting

,

 drawing

,

 and

 mixed

 media

.


My

 work

 often

 revolves

 around

 themes

 of

 nature

,

 architecture

,

 and

 the

 human

 experience

.

 I

 am

 fascinated

 by

 the

 intricate

 details

 of

 the

 natural

 world



Prompt: The capital of France is
Generated text: 

 a

 city

 of

 fashion

,

 romance

,

 and

 history

.

 With

 its

 beautiful

 landmarks

,

 vibrant

 culture

,

 and

 world

-class

 cuisine

,

 Paris

 is

 a

 must

-

visit

 destination

 for

 anyone

 looking

 for

 a

 truly

 unforgettable

 experience

.

 In

 this

 article

,

 we

 will

 explore

 some

 of

 the

 top

 things

 to

 do

 and

 see

 in

 Paris

,

 as

 well

 as

 some

 of

 the

 city

's

 hidden

 gems

 that

 you

 might

 not

 have

 heard

 about

 before

.


The

 City

 of

 Light

:

 Top

 Attr

actions




Paris

 is

 home

 to

 some

 of

 the

 world

's

 most

 famous

 landmarks

,

 including

 the

 E

iff

el

 Tower

,

 Notre

-D

ame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 Here

 are

 some

 of

 the

 top

 attractions

 to

 visit

:


1



Prompt: The future of AI is
Generated text: 

 uncertain

,

 but

 one

 thing

 is

 clear

:

 artificial

 intelligence

 will

 be

 a

 driving

 force

 behind

 the

 creation

 of

 new

 industries

,

 jobs

,

 and

 business

 models

.

 The

 possibilities

 for

 innovation

 and

 disruption

 are

 endless

,

 and

 the

 stakes

 are

 high

.


To

 prepare

 for

 the

 future

,

 it

’s

 essential

 to

 understand

 the

 current

 landscape

 of

 AI

 and

 its

 applications

.

 From

 natural

 language

 processing

 to

 computer

 vision

,

 machine

 learning

 to

 robotics

,

 AI

 is

 transforming

 industries

 and

 revolution

izing

 the

 way

 we

 live

 and

 work

.


In

 this

 article

,

 we

’ll

 explore

 the

 current

 state

 of

 AI

,

 its

 key

 applications

,

 and

 the

 future

 trends

 that

 will

 shape

 the

 industry

.



##

 Key

 Applications

 of

 AI





Art

ificial

 intelligence

 has

 a




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Mindy. I have a small garden where I grow my own herbs and vegetables. I love nothing more than spending time outdoors, surrounded by nature, and tending to my plants. It's a great way for me to unwind and connect with the natural world. I've learned a lot over the years about gardening, and I'd be happy to share my knowledge with others. Whether you're a seasoned gardener or just starting out, I'm here to help.
Frosts, Snows, and Crazies: How to Prepare for Winter in Your Garden
As the temperatures drop and the days grow shorter, it's time

Prompt: The capital of France is
Generated text:  a city with so much to offer, from iconic landmarks to world-class museums and a rich cultural scene. Here are some of the top attractions to visit in Paris:
1. The Eiffel Tower: The most iconic landmark in Paris, the Eiffel Tower is a must-visit attraction. Take the elevator to the top for breathtaking views of the city.
2. The Louvre Museum: One of the w

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 David

 and

 I

'm

 a

 fan

 of

 your

 website

.

 I

 have

 been

 visiting

 your

 site

 for

 years

 and

 have

 always

 been

 impressed

 with

 the

 quality

 of

 your

 content

 and

 the

 professionalism

 of

 your

 team

.

 Your

 website

 has

 become

 a

 valuable

 resource

 for

 me

,

 and

 I

 often

 find

 myself

 coming

 back

 to

 it

 for

 information

 and

 inspiration

.



However

,

 I

 wanted

 to

 reach

 out

 to

 you

 today

 because

 I

 have

 a

 question

 that

 I

 couldn

't

 find

 an

 answer

 to

 on

 your

 site

.

 I

 was

 wondering

 if

 you

 could

 provide

 some

 guidance

 on

 [

specific

 topic

 or

 issue

].

 I

've

 been

 struggling

 to

 find

 reliable

 information

 on

 this

 topic

,

 and

 I

 was

 hoping

 that

 your

 team

 might

 be

 able

 to

 offer

 some



Prompt: The capital of France is
Generated text: 

 Paris

,

 which

 is

 the

 largest

 city

 in

 the

 country

 and

 the

 most

 popular

 tourist

 destination

 in

 the

 world

.

 The

 city

 is

 known

 for

 its

 romantic

 atmosphere

,

 world

-class

 museums

,

 and

 stunning

 architecture

,

 including

 the

 iconic

 E

iff

el

 Tower

.


The

 history

 of

 France

 dates

 back

 to

 the

 Middle

 Ages

,

 with

 the

 country

 playing

 a

 significant

 role

 in

 the

 development

 of

 Western

 civilization

.

 France

 has

 been

 a

 major

 power

 in

 Europe

 for

 centuries

,

 and

 its

 language

,

 culture

,

 and

 cuisine

 have

 had

 a

 profound

 influence

 on

 the

 world

.


Today

,

 France

 is

 a

 modern

,

 secular

 democracy

 with

 a

 strong

 economy

 and

 a

 high

 standard

 of

 living

.

 The

 country

 is

 a

 member

 of

 the

 European

 Union

 and



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 it

 also

 raises

 fundamental

 questions

 about

 the

 role

 of

 humans

 in

 society

.

 For

 decades

,

 the

 pace

 of

 technological

 progress

 has

 been

 driven

 by

 our

 ability

 to

 harness

 the

 power

 of

 computation

,

 which

 has

 enabled

 us

 to

 process

 information

 faster

 and

 more

 efficiently

.

 As

 AI

 continues

 to

 advance

,

 we

 are

 entering

 a

 new

 era

 of

 intelligence

,

 where

 machines

 are

 not

 only

 processing

 information

 but

 also

 learning

 from

 data

,

 making

 decisions

,

 and

 even

 creating

 new

 knowledge

.


However

,

 as

 AI

 becomes

 more

 autonomous

 and

 self

-aware

,

 we

 are

 faced

 with

 the

 challenge

 of

 ensuring

 that

 it

 is

 aligned

 with

 human

 values

 and

 interests

.

 This

 requires

 a

 fundamental

 shift

 in

 how

 we

 design

 and

 develop

 AI

 systems




In [6]:
llm.shutdown()