# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.35it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.18it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.67it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.50it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.46it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:40,  1.86s/it]

  9%|▊         | 2/23 [00:02<00:22,  1.06s/it]

 13%|█▎        | 3/23 [00:02<00:14,  1.37it/s]

 17%|█▋        | 4/23 [00:03<00:11,  1.73it/s]

 22%|██▏       | 5/23 [00:03<00:08,  2.03it/s]

 26%|██▌       | 6/23 [00:03<00:07,  2.17it/s]

 30%|███       | 7/23 [00:04<00:06,  2.34it/s]

 35%|███▍      | 8/23 [00:04<00:06,  2.46it/s]

 39%|███▉      | 9/23 [00:04<00:05,  2.42it/s]

 43%|████▎     | 10/23 [00:05<00:05,  2.59it/s]

 48%|████▊     | 11/23 [00:05<00:04,  2.71it/s]

 52%|█████▏    | 12/23 [00:05<00:03,  2.77it/s]

 57%|█████▋    | 13/23 [00:06<00:03,  2.84it/s]

 61%|██████    | 14/23 [00:06<00:03,  2.88it/s]

 65%|██████▌   | 15/23 [00:06<00:02,  2.89it/s]

 70%|██████▉   | 16/23 [00:07<00:02,  2.93it/s]

 74%|███████▍  | 17/23 [00:07<00:02,  2.96it/s]

 78%|███████▊  | 18/23 [00:07<00:01,  2.97it/s]

 83%|████████▎ | 19/23 [00:08<00:01,  2.92it/s]

 87%|████████▋ | 20/23 [00:08<00:01,  2.86it/s]

 91%|█████████▏| 21/23 [00:09<00:00,  2.78it/s]

 96%|█████████▌| 22/23 [00:09<00:00,  2.68it/s]

100%|██████████| 23/23 [00:09<00:00,  2.62it/s]100%|██████████| 23/23 [00:09<00:00,  2.34it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Holly and I am a newly married woman. I have been married for just over a year and I have to say that it's been an amazing journey so far. My husband and I met through mutual friends and we hit it off immediately. We have a beautiful home, a lovely garden and a wonderful little dog who is part of our family.
I work from home as a freelance writer and I love it. I get to be my own boss and work on projects that I am passionate about. My husband works in an office and he commutes to work every day, but he loves his job and is very dedicated to his work.
We
Prompt: The president of the United States is
Generated text:  a president of great expectations, at least among his supporters. They see him as a transformational figure who will bring about a new era of peace and prosperity. But the president has a different view of himself.
In a private conversation with a close friend, he described himself as a "failed leader" and a "disaster." He spoke of

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Jessica

 and

 I

'm

 the

 owner

 of

 PB

&

J

 Pot

tery

.

 I

've

 always

 been

 a

 craft

y

 person

,

 and

 I

've

 been

 working

 with

 clay

 for

 over

 

20

 years

.

 My

 passion

 for

 ceramics

 began

 when

 I

 was

 a

 kid

,

 watching

 my

 grandmother

 work

 in

 her

 studio

.

 She

 was

 an

 incredible

 pot

ter

 and

 painter

,

 and

 she

 taught

 me

 everything

 she

 knew

.


As

 I

 grew

 older

,

 my

 interest

 in

 ceramics

 only

 deep

ened

.

 I

 continued

 to

 experiment

 with

 different

 techniques

 and

 styles

,

 and

 eventually

,

 I

 started

 selling

 my

 pieces

 at

 local

 craft

 f

airs

 and

 markets

.

 It

 wasn

't

 long

 before

 I

 decided

 to

 turn

 my

 hobby

 into

 a

 full

-time

 business

.


Today



Prompt: The capital of France is
Generated text: 

 located

 on

 the

 Se

ine

 River

 and

 is

 known

 for

 its

 beauty

,

 history

,

 and

 culture

.

 The

 City

 of

 Light

,

 as

 it

 is

 often

 called

,

 is

 famous

 for

 its

 iconic

 landmarks

 such

 as

 the

 E

iff

el

 Tower

,

 Notre

 Dame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 Paris

 is

 also

 home

 to

 many

 world

-class

 restaurants

,

 cafes

,

 and

 fashion

 bout

iques

.

 In

 addition

 to

 its

 beauty

 and

 cultural

 attractions

,

 Paris

 is

 also

 a

 hub

 for

 international

 business

 and

 finance

,

 with

 many

 multinational

 companies

 having

 headquarters

 or

 offices

 there

.

 Whether

 you

're

 interested

 in

 art

,

 history

,

 food

,

 fashion

,

 or

 business

,

 Paris

 has

 something

 for

 everyone

.


D

uf

our

-B

enn

et



Prompt: The future of AI is
Generated text: 

 not

 what

 you

 think

 it

 is




The

 way

 most

 people

 think

 about

 AI

 is

 shaped

 by

 science

 fiction

 and

 media

.

 Hollywood

 and

 the

 tech

 industry

 have

 created

 a

 notion

 that

 AI

 will

 be

 the

 do

oms

day

 machine

,

 or

 the

 ut

opian

 pan

acea

.

 However

,

 the

 reality

 is

 far

 more

 nuanced

.


In

 reality

,

 AI

 is

 a

 set

 of

 tools

 and

 techniques

 that

 will

 help

 solve

 the

 world

’s

 most

 pressing

 challenges

,

 like

 climate

 change

,

 sustainable

 resource

 management

,

 and

 disease

 prevention

.

 But

 it

 requires

 a

 multid

isc

iplinary

 approach

 and

 a

 focus

 on

 human

-centered

 design

,

 not

 just

 a

 bunch

 of

 smart

 machines

 doing

 the

 thinking

 for

 us

.


We

 need

 to

 redefine

 our

 expectations

 around

 AI




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Ellie and I am a podcast enthusiast! I love listening to all kinds of podcasts, from true crime to comedy to educational podcasts. I recently started a podcast of my own, called "My Life Unscripted." It's a podcast where I interview people from all walks of life and we have deep and meaningful conversations about their experiences, struggles, and successes. I'm really passionate about creating a space where people feel comfortable sharing their stories and where listeners can learn and grow from their experiences.

In my free time, I enjoy hiking, reading, and trying out new restaurants. I'm also a bit of a coffee snob and can often

Prompt: The capital of France is
Generated text:  a city of grandeur and beauty, with stunning architecture, world-class museums, and a vibrant cultural scene. Here are some of the top things to do in Paris:
1. Visit the Eiffel Tower: The iconic Eiffel Tower is a must-visit attraction in Paris, offering breathtak

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Sh

elly

.

 I

 am

 a

 happy

 and

 energetic

 dog

,

 with

 a

 heart

 of

 gold

.

 I

 love

 people

 and

 I

 love

 to

 play

!

 I

'm

 a

 little

 bit

 big

 for

 my

 brit

ches

,

 but

 that

's

 okay

,

 I

 know

 I

'm

 lo

vable

.


I

'm

 not

 too

 pick

y

 about

 what

 kind

 of

 play

 I

 get

,

 as

 long

 as

 it

's

 fun

!

 I

 love

 chasing

 balls

 and

 sticks

,

 but

 I

'm

 also

 happy

 to

 just

 cudd

le

 up

 next

 to

 my

 favorite

 human

 and

 sno

oze

 the

 day

 away

.


I

 do

 have

 one

 thing

 that

 I

 get

 a

 little

 anxious

 about

,

 and

 that

's

 being

 left

 alone

 for

 too

 long

.

 I

 know

 it

's

 not

 ideal



Prompt: The capital of France is
Generated text: 

 in

 the

 news

 again

,

 and

 this

 time

,

 it

’s

 for

 a

 good

 reason

!

 Paris

 has

 been

 named

 the

 number

 one

 destination

 in

 the

 world

 for

 tourists

,

 according

 to

 Master

card

’s

 Global

 Destination

 Cities

 Index

 (

G

DC

I

)

 for

 

202

2

.

 The

 report

 states

 that

 Paris

 attracted

 over

 

23

.

9

 million

 visitors

 last

 year

,

 making

 it

 the

 […]


The

 post

 Paris

 Named

 the

 Number

 One

 Destination

 in

 the

 World

 for

 Tour

ists

 appeared

 first

 on

 World

 Tourism

 .


The

 capital

 of

 France

 is

 in

 the

 news

 again

,

 and

 this

 time

,

 it

’s

 for

 a

 good

 reason

!

 Paris

 has

 been

 named

 the

 number

 one

 destination

 in

 the

 world

 for

 tourists

,

 according

 to

 Master

card



Prompt: The future of AI is
Generated text: 

 bright

,

 and

 it

 is

 already

 transforming

 various

 industries

.

 However

,

 it

 also

 raises

 concerns

 about

 job

 displacement

,

 bias

,

 and

 accountability

.

 As

 AI

 continues

 to

 evolve

,

 it

's

 crucial

 to

 understand

 its

 potential

 impact

 on

 society

 and

 to

 develop

 strategies

 to

 mitigate

 its

 negative

 effects

.

 In

 this

 article

,

 we

'll

 explore

 the

 future

 of

 AI

,

 its

 potential

 benefits

 and

 challenges

,

 and

 the

 steps

 we

 can

 take

 to

 ensure

 that

 AI

 is

 developed

 and

 used

 responsibly

.


The

 Future

 of

 AI

:

 Trends

 and

 Predict

ions




The

 future

 of

 AI

 is

 exciting

 and

 rapidly

 evolving

.

 Here

 are

 some

 trends

 and

 predictions

 that

 will

 shape

 the

 industry

:


1

.

 **

Increased

 Adoption

**:

 AI

 will

 become

 ubiquitous

,




In [6]:
llm.shutdown()