# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
from sglang.utils import print_highlight
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-10 14:19:23 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.28it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.18it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.18it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.60it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.43it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print_highlight("===============================")
    print_highlight(f"Prompt: {prompt}\nGenerated text: {output['text']}")

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print_highlight(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()

Generated text: 

 K

ell

ie

 and

 I

 have

 been

 working

 as

 a

 freelance

 writer

 since

 

201

4

.

 My

 portfolio

 includes

 articles

 and

 blogs

 for

 various

 publications

 and

 companies

,

 covering

 a

 wide

 range

 of

 topics

.

 I

 specialize

 in

 creating

 engaging

 and

 informative

 content

 that

 is

 tailored

 to

 specific

 audiences

.


I

 have

 experience

 writing

 for

 online

 magazines

,

 blogs

,

 and

 websites

,

 as

 well

 as

 creating

 content

 for

 companies

 looking

 to

 establish

 a

 strong

 online

 presence

.

 My

 writing

 style

 is

 convers

ational

,

 yet

 informative

,

 making

 complex

 topics

 accessible

 to

 a

 wide

 range

 of

 readers

.


I

 have

 a

 passion

 for

 storytelling

 and

 creating

 content

 that

 reson

ates

 with

 readers

.

 I

 believe

 that

 good

 writing

 should

 be

 informative

,

 engaging

,

 and

 entertaining




Generated text: 

 Paris

.


France

 is

 the

 most

 visited

 country

 in

 the

 world

.


The

 official

 language

 of

 France

 is

 French

.


The

 most

 popular

 French

 dish

 is

 Co

q

 au

 Vin

.

 It

 is

 a

 chicken

 dish

 cooked

 in

 red

 wine

 and

 mushrooms

.


The

 French

 Revolution

 began

 in

 

1789

.


The

 E

iff

el

 Tower

 is

 a

 famous

 landmark

 in

 Paris

.


France

 has

 a

 high

 standard

 of

 living

 and

 is

 a

 popular

 destination

 for

 tourists

.


The

 famous

 French

 artist

 Claude

 Mon

et

 was

 born

 in

 

184

0

.


The

 French

 painter

 Paul

 C

é

z

anne

 was

 born

 in

 

183

9

.


The

 Lou

vre

 Museum

 in

 Paris

 is

 one

 of

 the

 world

's

 largest

 and

 most

 famous

 museums

.


France

 is




Generated text: 

 in

 its

 ability

 to

 learn

 from

 experience

 and

 adapt

 to

 new

 situations

.

 However

,

 current

 AI

 systems

 are

 not

 yet

 capable

 of

 true

 learning

 and

 adaptation

.

 They

 are

 limited

 by

 their

 programming

 and

 data

,

 and

 are

 often

 unable

 to

 generalize

 to

 new

 situations

.

 The

 goal

 of

 the

 research

 presented

 here

 is

 to

 develop

 a

 new

 type

 of

 AI

 system

 that

 can

 learn

 and

 adapt

 in

 a

 more

 human

-like

 way

.

 This

 new

 AI

 system

 is

 called

 "

Ad

aptive

 Knowledge

 Retrie

val

"

 (

AK

R

).


The

 AK

R

 system

 is

 a

 hybrid

 of

 symbolic

 and

 connection

ist

 AI

,

 combining

 the

 strengths

 of

 both

 approaches

.

 The

 system

 uses

 a

 symbolic

 knowledge

 representation

 to

 store

 and

 reason

 with

 knowledge

,

 while




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print_highlight(f"\nPrompt: {prompt}")
        print_highlight(f"Generated text: {output['text']}")


asyncio.run(main())

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print_highlight(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())

Generated text: 

 Paula

 and

 I

 am

 a

 therapist

 at

 the

 M

aud

s

ley

 Hospital

 in

 South

 London

,

 UK

.


The

 M

aud

s

ley

 Hospital

 is

 a

 leading

 specialist

 mental

 health

 trust

,

 and

 I

 work

 in

 the

 adolescent

 department

,

 which

 provides

 specialist

 care

 for

 young

 people

 with

 mental

 health

 difficulties

.

 My

 work

 is

 primarily

 focused

 on

 the

 assessment

,

 diagnosis

 and

 treatment

 of

 young

 people

 with

 eating

 disorders

,

 particularly

 an

ore

xia

 nerv

osa

.


In

 this

 blog

,

 I

 aim

 to

 share

 my

 experiences

 and

 expertise

 as

 a

 therapist

,

 and

 to

 provide

 information

 and

 support

 to

 anyone

 affected

 by

 eating

 disorders

.

 My

 goal

 is

 to

 educate

 and

 raise

 awareness

 about

 this

 complex

 and

 serious

 condition

,

 and

 to

 provide

 hope

 and




Generated text: 

 Paris

.

 The

 official

 language

 of

 France

 is

 French

,

 and

 the

 population

 is

 approximately

 

67

 million

 people

.

 France

 is

 the

 

5

th

 largest

 country

 in

 the

 world

,

 covering

 an

 area

 of

 

643

,

801

 square

 kilometers

.

 The

 country

 is

 bordered

 by

 several

 other

 countries

 including

 Belgium

,

 Luxembourg

,

 Germany

,

 Switzerland

,

 Italy

,

 Spain

,

 And

orra

,

 and

 Monaco

.


France

 is

 a

 global

 leader

 in

 many

 areas

 such

 as

 culture

,

 science

,

 technology

,

 fashion

,

 and

 cuisine

.

 The

 country

 has

 made

 significant

 contributions

 to

 the

 world

,

 including

 the

 French

 Revolution

,

 which

 led

 to

 the

 creation

 of

 the

 modern

 concept

 of

 human

 rights

.


France

 is

 home

 to

 several

 UNESCO

 World

 Heritage

 sites




Generated text: 

 bright

,

 but

 a

 lot

 of

 work

 remains

 to

 be

 done

 before

 AI

 is

 truly

 integrated

 into

 our

 daily

 lives

.

 The

 key

 to

 unlocking

 AI

’s

 full

 potential

 lies

 in

 the

 development

 of

 more

 sophisticated

 algorithms

 and

 neural

 networks

.

 This

 requires

 collaboration

 between

 academia

,

 industry

,

 and

 government

,

 as

 well

 as

 the

 development

 of

 new

 technologies

 that

 enable

 seamless

 communication

 and

 data

 exchange

 between

 humans

,

 machines

,

 and

 systems

.


The

 growing

 need

 for

 data

 scientists

 and

 AI

 engineers

 is

 driving

 a

 significant

 increase

 in

 demand

 for

 AI

-related

 education

 and

 training

 programs

.

 To

 stay

 ahead

 in

 the

 rapidly

 evolving

 field

 of

 AI

,

 professionals

 need

 to

 continuously

 update

 their

 skills

 and

 knowledge

 to

 keep

 pace

 with

 the

 latest

 advancements

.

 Online




In [6]:
llm.shutdown()