# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-26 00:41:56 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.19it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.11it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.10it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.47it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.32it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Sue Cooper. I am a registered nurse and a hospice volunteer at the Community Hospice of Williamsburg. I have been working at the hospice for five years, and I am passionate about caring for patients and their families during their end-of-life journey.
My volunteer role involves visiting patients in their homes, hospitals, and nursing homes. I spend time with them, listening to their stories, and offering comfort and support to them and their loved ones. I also help with activities such as bathing, feeding, and providing light housekeeping assistance to make their lives a little easier.
One of the most rewarding aspects of my job is seeing the
Prompt: The president of the United States is
Generated text:  the head of the executive branch, which includes the White House staff, the Cabinet, and executive departments. The president is also the commander-in-chief of the armed forces and is responsible for signing or vetoing legislation passed by Co

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Max

ine

 and

 I

 have

 been

 helping

 people

 with

 their

 relationships

 for

 over

 

20

 years

.


I

 believe

 that

 relationships

 are

 the

 foundation

 of

 our

 happiness

,

 health

 and

 wellbeing

.

 They

 bring

 us

 joy

,

 comfort

,

 support

,

 and

 a

 sense

 of

 belonging

.

 But

 they

 also

 bring

 challenges

,

 conflicts

,

 and

 difficulties

.

 As

 a

 professional

 couples

 therapist

,

 I

 help

 individuals

 and

 couples

 navigate

 these

 challenges

 and

 deepen

 their

 connection

 with

 each

 other

.


My

 approach

 to

 therapy

 is

 warm

,

 empath

etic

,

 and

 non

-j

ud

gment

al

.

 I

 believe

 that

 every

 individual

 and

 couple

 is

 unique

,

 with

 their

 own

 experiences

,

 values

,

 and

 goals

.

 I

 work

 collabor

atively

 with

 my

 clients

 to

 understand

 their

 needs



Prompt: The capital of France is
Generated text: 

 also

 known

 for

 its

 art

,

 fashion

,

 and

 romance

,

 but

 did

 you

 know

 it

’s

 also

 a

 hub

 for

 science

 and

 technology

?


Here

 are

 some

 of

 the

 top

 tech

 companies

 in

 Paris

:


 

 

1

.

 D

ass

ault

 S

yst

èmes

:

 A

 leading

 software

 company

 that

 provides

 

3

D

 design

 and

 simulation

 solutions

 for

 industries

 such

 as

 aerospace

,

 automotive

,

 and

 consumer

 goods

.


 

 

2

.

 At

os

:

 A

 global

 IT

 services

 company

 that

 provides

 a

 range

 of

 services

 including

 cloud

 computing

,

 cybersecurity

,

 and

 data

 analytics

.


 

 

3

.

 Orange

:

 A

 telecommunications

 company

 that

 provides

 a

 range

 of

 services

 including

 mobile

,

 fixed

,

 and

 internet

 connectivity

.


 

 

4

.



Prompt: The future of AI is
Generated text: 

 in

 our

 hands




By

:

 L

ise

 Fu

hr

,

 Director

 General

 of

 ET

NO




The

 impact

 of

 Artificial

 Intelligence

 (

AI

)

 on

 the

 digital

 economy

 is

 undeniable

.

 AI

 is

 transforming

 the

 way

 we

 live

,

 work

 and

 interact

 with

 one

 another

.

 But

 the

 future

 of

 AI

 is

 not

 just

 about

 technological

 advancements

 –

 it

 is

 also

 about

 ensuring

 that

 the

 benefits

 of

 AI

 are

 distributed

 fairly

 and

 that

 its

 negative

 consequences

 are

 mitig

ated

.


As

 a

 representative

 of

 the

 European

 telecom

s

 industry

,

 I

 am

 ac

utely

 aware

 of

 the

 role

 that

 our

 sector

 will

 play

 in

 the

 development

 and

 deployment

 of

 AI

.

 Our

 networks

,

 infrastructure

,

 and

 services

 will

 be

 essential

 for

 the

 widespread

 adoption

 of

 AI




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Spencer and I'm a software developer. I have a passion for building things and solving problems, and I've been doing so professionally for over 10 years.
I've worked on a wide range of projects, from small startup apps to large enterprise systems, and I'm confident in my ability to deliver high-quality software that meets your needs.
I'm proficient in a variety of programming languages and technologies, including Java, Python, C++, and JavaScript. I'm also familiar with a range of development frameworks and tools, including Spring, Django, and Angular.
In addition to my technical skills, I'm a strong communicator and team player. I

Prompt: The capital of France is
Generated text:  situated on the Seine River. It is a city of grandeur and beauty with historical landmarks, art museums, and charming neighborhoods. A popular destination for tourists and locals alike, Paris has a unique charm that attracts people from all over the world.
Tourist 

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Ava

.

 I

'm

 a

 single

 mom

 of

 two

 beautiful

 kids

,

 ages

 

6

 and

 

8

.

 They

 keep

 me

 on

 my

 toes

,

 and

 I

'm

 loving

 every

 minute

 of

 it

!


I

'm

 a

 blogger

 and

 a

 writer

,

 and

 I

 love

 sharing

 my

 thoughts

 and

 experiences

 on

 my

 blog

,

 which

 focuses

 on

 parenting

,

 self

-care

,

 and

 personal

 growth

.

 I

'm

 passionate

 about

 helping

 other

 parents

 navigate

 the

 challenges

 of

 mother

hood

 and

 find

 their

 own

 path

 to

 happiness

 and

 fulfillment

.


In

 my

 free

 time

,

 I

 enjoy

 reading

,

 hiking

,

 and

 practicing

 yoga

.

 I

'm

 also

 a

 bit

 of

 a

 food

ie

 and

 love

 trying

 out

 new

 recipes

 in

 the

 kitchen

.

 My

 kids

 are

 my



Prompt: The capital of France is
Generated text: 

 Paris

 and

 it

 is

 considered

 the

 country

's

 largest

 city

.

 It

 has

 a

 population

 of

 over

 

2

 million

 people

.

 The

 city

 is

 located

 on

 the

 Se

ine

 River

 and

 it

 is

 known

 for

 its

 beautiful

 architecture

,

 art

 museums

,

 fashion

,

 and

 cuisine

.

 Paris

 has

 been

 the

 capital

 of

 France

 since

 

987

 AD

 and

 it

 is

 home

 to

 many

 famous

 landmarks

 such

 as

 the

 E

iff

el

 Tower

,

 Notre

 Dame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 It

 is

 also

 the

 center

 of

 French

 culture

 and

 politics

.

 Paris

 is

 a

 popular

 tourist

 destination

 and

 it

 attracts

 millions

 of

 visitors

 every

 year

.


Some

 of

 the

 popular

 tourist

 attractions

 in

 Paris

 include

:


The

 E

iff

el

 Tower

:

 This



Prompt: The future of AI is
Generated text: 

 not

 in

 replacing

 humans

 but

 in

 empowering

 them




By

 E

itan

 Wer

the

imer




E

itan

 Wer

the

imer




Art

ificial

 intelligence

 (

AI

)

 has

 come

 a

 long

 way

 since

 its

 inception

 in

 the

 

195

0

s

.

 The

 field

 has

 made

 tremendous

 progress

,

 transforming

 industries

 such

 as

 healthcare

,

 finance

,

 and

 transportation

.

 AI

 has

 become

 an

 integral

 part

 of

 our

 daily

 lives

,

 making

 tasks

 more

 efficient

 and

 accurate

.

 However

,

 as

 we

 continue

 to

 develop

 and

 deploy

 AI

 systems

,

 there

's

 a

 growing

 concern

 about

 the

 impact

 on

 human

 employment

 and

 the

 potential

 for

 AI

 to

 dis

place

 jobs

.


While

 AI

 has

 the

 potential

 to

 automate

 many

 tasks

,

 it

's

 essential

 to

 understand

 that




In [6]:
llm.shutdown()