# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  6.28it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.71it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.38it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.24it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.39it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Shirley and I'm a 35 year old Divorcee, married to my wonderful husband for 10 years. I have two beautiful children, a 6 year old boy and a 3 year old girl. I work as a nurse in a busy hospital and I love my job. My children are my world and I'm so grateful to have them in my life. My husband and I are currently separated and going through a difficult time, but we're trying to stay strong for our kids and work through our issues. I'm a bit of a worrier, but I'm trying to focus on the positive and not let my anxiety
Prompt: The president of the United States is
Generated text:  not the head of the church of the United States. The head of the church of the United States is Jesus Christ. The president has no authority to make laws that abridge the freedom to worship God, to evangelize, or to engage in any other form of Christian activity. This is a fundamental principle of our nation, enshrined in the First Amendment.
I must admit that, when I he

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Chris

 and

 I

’m

 a

 Principal

 Consultant

 at

 Her

iot

-W

att

 University

.

 I

 have

 over

 

10

 years

 of

 experience

 in

 academic

 leadership

 and

 management

 in

 higher

 education

.


As

 a

 Principal

 Consultant

 at

 Her

iot

-W

att

 University

,

 I

 work

 with

 a

 wide

 range

 of

 clients

 across

 different

 sectors

,

 providing

 strategic

 guidance

,

 support

 and

 solutions

 to

 help

 them

 achieve

 their

 goals

.

 I

 have

 a

 strong

 track

 record

 of

 success

 in

 developing

 and

 implementing

 high

-

impact

 projects

 that

 drive

 change

 and

 improve

 performance

.


My

 expertise

 spans

 across

 a

 range

 of

 areas

,

 including

 strategic

 planning

,

 leadership

 development

,

 project

 management

,

 organizational

 change

,

 and

 staff

 development

.

 I

 have

 a

 deep

 understanding

 of

 the

 complexities

 of

 higher

 education



Prompt: The capital of France is
Generated text: 

 known

 for

 its

 rich

 history

,

 art

,

 fashion

,

 and

 cuisine

.

 From

 the

 iconic

 E

iff

el

 Tower

 to

 the

 world

-ren

owned

 Lou

vre

 Museum

,

 Paris

 is

 a

 city

 that

 will

 leave

 you

 in

 awe

.

 Here

 are

 some

 of

 the

 top

 attractions

 to

 visit

 when

 in

 Paris

:


1

.

 The

 E

iff

el

 Tower

:

 The

 E

iff

el

 Tower

 is

 an

 iron

 lattice

 tower

 built

 for

 the

 

188

9

 World

's

 Fair

.

 It

 stands

 at

 

324

 meters

 tall

 and

 is

 one

 of

 the

 most

 iconic

 landmarks

 in

 the

 world

.


2

.

 The

 Lou

vre

 Museum

:

 The

 Lou

vre

 Museum

 is

 one

 of

 the

 world

's

 largest

 and

 most

 famous

 museums

.

 It

 houses

 an

 impressive

 collection



Prompt: The future of AI is
Generated text: 

 all

 about

 the

 “

human

-in

-the

-loop

”


By

:

 Dave

 Cop

ps

,

 R

PA

 Solutions




It

’s

 no

 secret

 that

 AI

 has

 the

 potential

 to

 revolution

ize

 industries

 and

 organizations

 around

 the

 world

.

 But

 as

 AI

 technology

 continues

 to

 evolve

,

 it

’s

 becoming

 increasingly

 clear

 that

 the

 real

 power

 lies

 not

 in

 the

 machines

 themselves

,

 but

 in

 the

 collaboration

 between

 humans

 and

 machines

.


The

 concept

 of

 AI

 as

 a

 standalone

 entity

,

 making

 decisions

 without

 human

 oversight

,

 is

 giving

 way

 to

 a

 more

 nuanced

 approach

 –

 one

 where

 humans

 are

 actively

 engaged

 in

 the

 decision

-making

 process

 alongside

 machines

.

 This

 is

 often

 referred

 to

 as

 the

 “

human

-in

-the

-loop

”

 approach

.


In

 this

 model

,




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Bryan and I am the founder and Chief Experience Officer (CXO) of Firefly. I started Firefly in 2003 as a creative agency, with a focus on providing high-quality visual effects, motion graphics and design services to a variety of clients, from film and television to corporate and non-profit organizations.
Over the years, our team has grown and evolved, but our core mission has remained the same: to bring innovative and engaging storytelling to our clients. We've had the privilege of working with some amazing brands and talent, and I'm proud of the work we've done.
When I'm not leading the Firefly team,

Prompt: The capital of France is
Generated text:  a city of art, fashion, and culture. The city of Paris is located in the north-central part of France, on the Seine River. Paris is a major tourist destination, known for its stunning architecture, museums, fashion, and cuisine. The city is home to many world-renowned landmarks, including the Ei

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Jimmy

 and

 I

 am

 the

 new

 editor

 of

 this

 blog

.

 I

 am

 excited

 to

 take

 on

 this

 role

 and

 continue

 the

 legacy

 of

 this

 site

.

 I

 am

 a

 bit

 of

 a

 tech

 enthusiast

 and

 I

 love

 to

 learn

 about

 new

 gadgets

 and

 technologies

.


As

 the

 editor

,

 I

 will

 be

 responsible

 for

 posting

 new

 articles

 and

 updates

 to

 the

 site

.

 I

 will

 also

 be

 responsible

 for

 keeping

 the

 site

 up

 to

 date

 with

 the

 latest

 news

 and

 trends

 in

 the

 tech

 world

.


I

 am

 looking

 forward

 to

 bringing

 a

 fresh

 perspective

 to

 the

 site

 and

 sharing

 my

 knowledge

 with

 the

 community

.

 I

 hope

 to

 engage

 with

 you

 all

 and

 hear

 your

 feedback

 and

 suggestions

 on

 how

 to

 improve

 the

 site

.


In



Prompt: The capital of France is
Generated text: 

 known

 for

 its

 stunning

 beauty

 and

 rich

 history

,

 but

 it

’s

 also

 home

 to

 a

 vibrant

 and

 thriving

 food

 scene

.

 From

 the

 famous

 pat

is

series

 to

 the

 bustling

 markets

,

 there

’s

 no

 shortage

 of

 delicious

 options

 to

 try

.

 Here

 are

 some

 of

 the

 best

 French

 foods

 to

 try

 in

 Paris

:


1

.

 Cro

iss

ants

:

 Fl

aky

,

 butter

y

,

 and

 fresh

,

 cro

iss

ants

 are

 a

 staple

 in

 French

 baker

ies

.

 Try

 a

 classic

 plain

 or

 chocolate

-filled

 cro

issant

 at

 a

 pat

is

serie

 like

 Lad

ur

ée

 or

 Pierre

 Herm

é

.


2

.

 Bag

u

ette

:

 The

 quint

essential

 French

 bread

,

 bag

uet

tes

 are

 a

 must

-

try

.

 Enjoy

 them



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 what

 does

 it

 mean

 for

 human

 workers

?


Art

ificial

 intelligence

 (

AI

)

 is

 transforming

 the

 way

 we

 work

,

 and

 its

 impact

 will

 only

 grow

 in

 the

 years

 to

 come

.

 Some

 experts

 predict

 that

 AI

 will

 dis

place

 certain

 jobs

,

 while

 others

 believe

 it

 will

 create

 new

 opportunities

 and

 augment

 existing

 ones

.


According

 to

 a

 report

 by

 the

 McKin

sey

 Global

 Institute

,

 up

 to

 

800

 million

 jobs

 could

 be

 lost

 worldwide

 due

 to

 automation

 by

 

203

0

.

 However

,

 the

 same

 report

 also

 suggests

 that

 AI

 will

 create

 new

 job

 opportunities

,

 such

 as

 in

 fields

 like

 AI

 development

,

 deployment

,

 and

 maintenance

.


To

 prepare

 for

 the

 changing

 job

 market

,

 we

 need

 to




In [6]:
llm.shutdown()