# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  4.66it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.72it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.30it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.17it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.32it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:23,  1.05s/it]

  9%|▊         | 2/23 [00:01<00:11,  1.77it/s]

 13%|█▎        | 3/23 [00:01<00:08,  2.48it/s]

 17%|█▋        | 4/23 [00:01<00:06,  3.07it/s]

 22%|██▏       | 5/23 [00:01<00:05,  3.54it/s]

 26%|██▌       | 6/23 [00:02<00:04,  3.78it/s]

 30%|███       | 7/23 [00:02<00:04,  3.99it/s]

 35%|███▍      | 8/23 [00:02<00:03,  4.21it/s]

 39%|███▉      | 9/23 [00:02<00:03,  4.36it/s]

 43%|████▎     | 10/23 [00:02<00:02,  4.50it/s]

 48%|████▊     | 11/23 [00:03<00:02,  4.58it/s]

 52%|█████▏    | 12/23 [00:03<00:02,  4.53it/s]

 57%|█████▋    | 13/23 [00:03<00:02,  4.52it/s]

 61%|██████    | 14/23 [00:03<00:02,  4.42it/s]

 65%|██████▌   | 15/23 [00:04<00:01,  4.33it/s]

 70%|██████▉   | 16/23 [00:04<00:01,  4.11it/s]

 74%|███████▍  | 17/23 [00:04<00:01,  4.01it/s]

 78%|███████▊  | 18/23 [00:04<00:01,  3.83it/s]

 83%|████████▎ | 19/23 [00:05<00:01,  3.72it/s]

 87%|████████▋ | 20/23 [00:05<00:00,  3.64it/s]

 91%|█████████▏| 21/23 [00:05<00:00,  3.65it/s]

 96%|█████████▌| 22/23 [00:06<00:00,  3.73it/s]

100%|██████████| 23/23 [00:06<00:00,  3.81it/s]100%|██████████| 23/23 [00:06<00:00,  3.66it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Brian and I am a 27 year old photographer from the UK. I specialise in capturing the best of the world's most amazing landscapes, from the stunning coastlines of Cornwall to the breathtaking scenery of New Zealand. My work has taken me to many different countries and has allowed me to experience a wide range of cultures and environments. I love nothing more than getting out into the great outdoors and capturing the beauty of the natural world on camera. I am always on the lookout for new and exciting locations to photograph and am eager to take on new challenges and adventures. My photographs are not just mere snapshots, but rather they are an attempt to
Prompt: The president of the United States is
Generated text:  like a king and a dictator mixed together. He has absolute authority and control over the government and the people. The Constitution does not provide any checks on the president's power, so he can do whatever he wants. The preside

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Rach

ael

,

 and

 I

 am

 a

 police

 officer

 in

 the

 United

 Kingdom

.

 I

 have

 been

 in

 this

 role

 for

 over

 

10

 years

,

 and

 I

 am

 proud

 to

 be

 part

 of

 an

 elite

 team

 of

 dedicated

 and

 brave

 men

 and

 women

 who

 are

 committed

 to

 keeping

 our

 communities

 safe

.


My

 passion

 for

 policing

 began

 when

 I

 was

 a

 teenager

.

 I

 grew

 up

 in

 a

 close

-k

nit

 community

 where

 everyone

 knew

 each

 other

,

 and

 we

 all

 looked

 out

 for

 one

 another

.

 However

,

 I

 saw

 firsthand

 the

 devastating

 impact

 of

 crime

 on

 my

 friends

 and

 neighbors

,

 and

 I

 knew

 that

 I

 wanted

 to

 make

 a

 difference

.


As

 a

 police

 officer

,

 I

 have

 been

 fortunate

 enough

 to

 have

 had



Prompt: The capital of France is
Generated text: 

 Paris

,

 and

 it

 is

 located

 in

 the

 northern

 part

 of

 the

 country

.

 It

 is

 the

 largest

 city

 in

 France

 and

 is

 known

 for

 its

 beauty

 and

 cultural

 significance

.


What

 are

 the

 attractions

 of

 Paris

?


Some

 of

 the

 most

 famous

 attractions

 in

 Paris

 include

:


1

.

 The

 E

iff

el

 Tower

:

 This

 iconic

 tower

 is

 a

 symbol

 of

 Paris

 and

 one

 of

 the

 most

 recognizable

 landmarks

 in

 the

 world

.


2

.

 The

 Lou

vre

 Museum

:

 This

 museum

 is

 home

 to

 some

 of

 the

 world

's

 most

 famous

 paintings

,

 including

 the

 Mona

 Lisa

.


3

.

 Notre

 Dame

 Cathedral

:

 This

 beautiful

 cathedral

 is

 one

 of

 the

 most

 famous

 in

 the

 world

 and

 has

 been

 the

 site

 of

 many

 important

 events



Prompt: The future of AI is
Generated text: 

 in

 its

 ability

 to

 understand

 and

 act

 on

 human

 emotions

.

 The

 integration

 of

 AI

 with

 human

 emotions

 is

 a

 rapidly

 evolving

 field

,

 and

 this

 integration

 has

 the

 potential

 to

 significantly

 enhance

 human

 well

-being

.


Em

otion

-aware

 AI

 is

 an

 emerging

 field

 that

 focuses

 on

 developing

 AI

 systems

 that

 can

 recognize

,

 understand

,

 and

 respond

 to

 human

 emotions

.

 These

 systems

 can

 be

 integrated

 into

 various

 applications

,

 such

 as

 healthcare

,

 education

,

 customer

 service

,

 and

 mental

 health

 support

.


Some

 of

 the

 key

 areas

 where

 AI

 is

 being

 integrated

 with

 human

 emotions

 include

:


1

.

 Em

otion

 recognition

:

 AI

 systems

 can

 recognize

 and

 analyze

 human

 emotions

 through

 facial

 expressions

,

 speech

 patterns

,

 and

 physiological

 signals

.


2

.




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Alex. I'm a 34-year-old single male. I've been using computers and technology for over 20 years and have a solid foundation in basic computer skills. I've recently started looking into website design and development, and I'd like to learn more about it.
I've been looking online for tutorials, courses, and resources to help me learn the basics of website design and development. I've come across some conflicting information, and I'm not sure where to start.
What are some good resources for learning website design and development?
What are the key skills and technologies that I should focus on learning?
What are some common challenges that

Prompt: The capital of France is
Generated text:  a city like no other. From its iconic landmarks and world-class museums to its charming neighborhoods and romantic atmosphere, Paris is a destination that has captivated visitors for centuries.
Must-see attractions in Paris include the Eiffel Tower, the Louvre

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Leah

!

 I

 am

 a

 

19

 year

 old

 college

 student

,

 and

 I

 have

 been

 interested

 in

 makeup

 for

 as

 long

 as

 I

 can

 remember

.

 I

 have

 been

 watching

 YouTube

 tutorials

 and

 practicing

 makeup

 on

 myself

 for

 years

,

 and

 I

 recently

 decided

 to

 start

 my

 own

 YouTube

 channel

 to

 share

 my

 passion

 with

 the

 world

!


I

 have

 a

 wide

 range

 of

 makeup

 interests

,

 from

 natural

 everyday

 looks

 to

 more

 dramatic

 evening

 gl

ows

.

 I

 love

 experimenting

 with

 new

 products

 and

 techniques

,

 and

 I

 am

 always

 excited

 to

 share

 my

 latest

 finds

 and

 favorites

 with

 my

 viewers

.


On

 my

 channel

,

 you

 can

 expect

 to

 see

 a

 variety

 of

 makeup

 tutorials

,

 product

 reviews

,

 and

 haul

s

.

 I

 will



Prompt: The capital of France is
Generated text: 

 a

 city

 like

 no

 other

,

 a

 place

 of

 grand

 history

 and

 architectural

 landmarks

,

 world

-class

 museums

,

 and

 a

 vibrant

 cultural

 scene

.

 Paris

 is

 a

 destination

 that

 offers

 something

 for

 everyone

,

 from

 romantic

 st

rolls

 along

 the

 Se

ine

 River

 to

 cutting

-edge

 fashion

 and

 cuisine

.

 It

 is

 the

 perfect

 city

 for

 travelers

 looking

 to

 experience

 the

 quint

essential

 French

 lifestyle

.


This

 travel

 guide

 covers

 the

 best

 places

 to

 visit

 in

 Paris

,

 the

 top

 museums

 and

 galleries

,

 the

 best

 restaurants

 and

 cafes

,

 and

 the

 best

 shopping

 destinations

.

 It

 also

 includes

 helpful

 tips

 and

 advice

 on

 how

 to

 navigate

 the

 city

 and

 make

 the

 most

 of

 your

 visit

.


Book

 your

 trip

 to

 Paris

 and

 discover

 why

 it

 is



Prompt: The future of AI is
Generated text: 

 in

 collaboration

,

 not

 competition




From

 AI

 assistants

 like

 Alexa

 and

 Google

 Assistant

,

 to

 algorithms

 that

 help

 us

 navigate

 through

 traffic

,

 AI

 has

 become

 an

 integral

 part

 of

 our

 daily

 lives

.

 However

,

 the

 development

 and

 deployment

 of

 AI

 has

 been

 largely

 centered

 around

 competition

,

 where

 companies

 and

 individuals

 are

 racing

 to

 build

 the

 most

 advanced

 and

 sophisticated

 AI

 systems

.


But

 is

 this

 competition

-driven

 approach

 the

 right

 way

 to

 go

?

 I

 argue

 that

 the

 future

 of

 AI

 is

 in

 collaboration

,

 not

 competition

.


The

 benefits

 of

 collaboration




Coll

abor

ation

 in

 AI

 development

 can

 lead

 to

 several

 benefits

,

 including

:


1

.

 Acceler

ating

 innovation

:

 When

 researchers

 and

 developers

 from

 different

 backgrounds

 and

 institutions

 work

 together




In [6]:
llm.shutdown()