# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  6.12it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.60it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.32it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.19it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.34it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:00<00:21,  1.02it/s]  9%|▊         | 2/23 [00:01<00:10,  1.95it/s]

 13%|█▎        | 3/23 [00:01<00:07,  2.78it/s] 17%|█▋        | 4/23 [00:01<00:05,  3.47it/s]

 22%|██▏       | 5/23 [00:01<00:04,  4.04it/s]

 26%|██▌       | 6/23 [00:01<00:04,  4.18it/s] 30%|███       | 7/23 [00:02<00:03,  4.56it/s]

 35%|███▍      | 8/23 [00:02<00:03,  4.85it/s] 39%|███▉      | 9/23 [00:02<00:02,  5.08it/s]

 43%|████▎     | 10/23 [00:02<00:02,  5.20it/s] 48%|████▊     | 11/23 [00:02<00:02,  5.33it/s]

 52%|█████▏    | 12/23 [00:02<00:02,  5.42it/s] 57%|█████▋    | 13/23 [00:03<00:01,  5.46it/s]

 61%|██████    | 14/23 [00:03<00:01,  5.47it/s] 65%|██████▌   | 15/23 [00:03<00:01,  5.49it/s]

 70%|██████▉   | 16/23 [00:03<00:01,  5.54it/s] 74%|███████▍  | 17/23 [00:03<00:01,  5.57it/s]

 78%|███████▊  | 18/23 [00:04<00:00,  5.59it/s] 83%|████████▎ | 19/23 [00:04<00:00,  5.58it/s]

 87%|████████▋ | 20/23 [00:04<00:00,  5.60it/s] 91%|█████████▏| 21/23 [00:04<00:00,  5.64it/s]

 96%|█████████▌| 22/23 [00:04<00:00,  5.58it/s]100%|██████████| 23/23 [00:04<00:00,  5.60it/s]100%|██████████| 23/23 [00:04<00:00,  4.64it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Chris, and I'm a relatively new beekeeper in the Pacific Northwest. I'm excited to be part of this community and learn from all of you. I've been keeping bees for about a year and a half now, and I've had a bit of a mixed experience. I've lost two hives over the winters, but I've also had a few successes, and I'm hoping to learn from my mistakes and do better this year.
I'm keeping bees in a relatively small yard in a suburban area, and I'm trying to do everything I can to minimize the impact of pesticides and other chemicals on my bees. I've
Prompt: The president of the United States is
Generated text:  constitutionally empowered to convene Congress into emergency session, bypassing the usual formalities of Congress’s calendar. The president, as commander-in-chief, can also order military action without congressional approval. However, the president’s authority to act without Congress in times of crisis or war is limited by the War Powers Res

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Nicole

 and

 I

'm

 a

 therapist

.

 I

 have

 been

 working

 with

 children

 and

 adults

 in

 a

 variety

 of

 settings

 for

 over

 

10

 years

.

 I

 have

 a

 Master

's

 degree

 in

 Clinical

 Psychology

 and

 I

 am

 licensed

 to

 practice

 in

 the

 state

 of

 California

.


I

 specialize

 in

 working

 with

 individuals

 who

 are

 experiencing

 anxiety

,

 depression

,

 trauma

,

 and

 relationship

 issues

.

 I

 also

 have

 experience

 working

 with

 individuals

 who

 have

 experienced

 a

 loss

 or

 are

 grieving

.


I

 am

 trained

 in

 several

 therapeutic

 modal

ities

,

 including

 C

BT

 (

C

ognitive

 Behavioral

 Therapy

),

 DB

T

 (

D

ialect

ical

 Behavior

 Therapy

),

 and

 E

MD

R

 (

Eye

 Movement

 Des

ens

it

ization

 and

 Rep

rocessing

).

 I

 believe

 in

 tail



Prompt: The capital of France is
Generated text: 

 a

 treasure

 tro

ve

 of

 history

,

 art

,

 fashion

,

 and

 cuisine

.

 There

's

 so

 much

 to

 see

 and

 do

 in

 Paris

 that

 you

 may

 need

 a

 week

 to

 fully

 experience

 the

 city

.

 But

 if

 you

 only

 have

 

2

 days

,

 here

's

 a

 suggested

 itinerary

 to

 help

 you

 make

 the

 most

 of

 your

 time

:


Day

 

1

:

 Morning

 in

 Mont

mart

re

 and

 After

noon

 in

 the

 Latin

 Quarter




Start

 your

 day

 in

 the

 charming

 neighborhood

 of

 Mont

mart

re

,

 which

 is

 famous

 for

 its

 bo

hem

ian

 vibe

,

 street

 artists

,

 and

 stunning

 views

 of

 the

 city

.

 Visit

 the

 Sac

ré

-C

œur

 Basil

ica

,

 a

 beautiful

 white

 church

 per

ched

 on

 a

 hill

.



Prompt: The future of AI is
Generated text: 

 bright

,

 and

 it

’s

 already

 here




AI

 is

 transforming

 industries

 and

 revolution

izing

 the

 way

 we

 live

 and

 work

,

 from

 healthcare

 and

 finance

 to

 education

 and

 entertainment

.


Art

ificial

 intelligence

 (

AI

)

 has

 come

 a

 long

 way

 since

 its

 inception

.

 From

 its

 humble

 beginnings

 in

 the

 

195

0

s

 to

 the

 current

 AI

 revolution

,

 the

 technology

 has

 evolved

 significantly

.

 Today

,

 AI

 is

 transforming

 industries

 and

 revolution

izing

 the

 way

 we

 live

 and

 work

,

 from

 healthcare

 and

 finance

 to

 education

 and

 entertainment

.


The

 future

 of

 AI

 is

 bright

,

 and

 it

’s

 already

 here

.

 AI

-powered

 chat

bots

 are

 assisting

 customers

,

 AI

-driven

 robots

 are

 manufacturing

 products

,

 and

 AI

-based

 systems

 are

 diagn

osing




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Danny, and I'm a career coach. I work with professionals, managers, and business owners to help them find a new career path or improve their current one. I'm here to help you identify your strengths, passions, and values to find a career that's right for you.
Here are some topics we can discuss in our coaching sessions:
* Career exploration and identity
* Resume and job search strategies
* Networking and professional development
* Interview preparation and practice
* Salary negotiation and career advancement
* Building a personal brand and online presence
* Managing work-life balance and stress
* Transitioning to a new industry or career


Prompt: The capital of France is
Generated text:  Paris, and it is known as the "City of Light" for its beauty, art, and culture. Paris is famous for the Eiffel Tower, the Louvre Museum, and Notre Dame Cathedral. Many famous artists, writers, and thinkers have lived and worked in Paris, including Claude Mon

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Ker

stin

.

 I

 was

 born

 and

 raised

 in

 Southern

 California

 and

 have

 been

 a

 hairst

y

list

 for

 over

 

15

 years

.

 I

 specialize

 in

 precision

 cutting

 and

 coloring

.

 I

 have

 worked

 in

 top

 sal

ons

 in

 the

 city

 of

 Los

 Angeles

 and

 have

 been

 featured

 in

 several

 fashion

 magazines

.

 I

 love

 working

 with

 my

 clients

 to

 create

 a

 look

 that

 is

 uniquely

 their

 own

.

 I

 am

 passionate

 about

 staying

 up

 to

 date

 on

 the

 latest

 trends

 and

 techniques

 in

 the

 industry

.

 I

 am

 excited

 to

 be

 working

 here

 at

 Tr

icho

 Salon

 and

 look

 forward

 to

 creating

 beautiful

 hairstyles

 for

 all

 of

 my

 clients

.

 Schedule

 a

 consultation

 today

 and

 let

's

 get

 started

 on

 making

 you

 look

 and

 feel

 your



Prompt: The capital of France is
Generated text: 

 Paris

,

 but

 the

 largest

 city

 is

 Lyon

.

 Lyon

 is

 known

 for

 its

 gastr

onomy

,

 and

 is

 often

 referred

 to

 as

 the

 gastr

onomic

 capital

 of

 France

.

 The

 city

 is

 also

 known

 for

 its

 rich

 history

 and

 cultural

 heritage

.

 The

 Roman

 ruins

 of

 Four

vi

ère

 hill

,

 the

 Basil

ica

 of

 Notre

-D

ame

 de

 Four

vi

ère

 and

 the

 ancient

 Roman

 theater

 are

 some

 of

 the

 popular

 tourist

 attractions

 in

 Lyon

.


L

yon

 is

 also

 home

 to

 a

 number

 of

 festivals

 and

 events

 throughout

 the

 year

,

 including

 the

 F

ête

 des

 L

umi

ères

 (

F

estival

 of

 Lights

),

 which

 is

 a

 five

-day

 celebration

 of

 light

 and

 color

 that

 takes

 place

 in

 December

.

 The

 city

 also

 hosts



Prompt: The future of AI is
Generated text: 

 now

:

 

5

 key

 take

aways

 from

 AI

 Summit

 

202

3




The

 AI

 Summit

 

202

3

,

 held

 in

 New

 York

 City

,

 brought

 together

 some

 of

 the

 world

's

 leading

 experts

 in

 AI

,

 highlighting

 the

 rapid

 evolution

 of

 the

 technology

 and

 its

 far

-reaching

 implications

.


Here

 are

 

5

 key

 take

aways

 from

 the

 summit

:


1

.

 **

AI

 is

 already

 transforming

 industries

**:

 The

 summit

 showcased

 numerous

 examples

 of

 AI

 being

 applied

 in

 various

 sectors

,

 such

 as

 healthcare

,

 finance

,

 education

,

 and

 manufacturing

.

 Companies

 like

 Google

,

 Microsoft

,

 and

 IBM

 presented

 their

 AI

-powered

 solutions

,

 demonstrating

 the

 technology

's

 potential

 to

 drive

 business

 innovation

 and

 improve

 lives

.


2

.

 **

Ex

plain




In [6]:
llm.shutdown()