# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
from sglang.utils import print_highlight
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-14 08:01:53 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.25it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.12it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.13it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.54it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.37it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print_highlight("===============================")
    print_highlight(f"Prompt: {prompt}\nGenerated text: {output['text']}")

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print_highlight(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()

Generated text: 

 David

 Johnson

 and

 I

 am

 a

 professional

 paranormal

 investigator

.

 I

 have

 been

 investigating

 the

 paranormal

 for

 over

 

30

 years

,

 and

 have

 worked

 with

 some

 of

 the

 most

 well

-known

 and

 respected

 ghost

 hunting

 teams

 in

 the

 world

.


I

 have

 investigated

 over

 

1

,

000

 cases

 and

 have

 captured

 some

 of

 the

 most

 compelling

 evidence

 of

 paranormal

 activity

 in

 the

 world

.

 My

 work

 has

 been

 featured

 on

 numerous

 TV

 shows

 and

 documentaries

,

 including

 "

Ghost

 Hunters

,"

 "

Par

an

ormal

 Witness

,"

 and

 "

Ghost

 Adventures

."


My

 team

 and

 I

 have

 developed

 a

 unique

 and

 scientifically

-based

 approach

 to

 investigating

 the

 paranormal

,

 which

 combines

 traditional

 ghost

 hunting

 techniques

 with

 cutting

-edge

 technology

 and

 scientific

 methods

.

 We

 use

 advanced




Generated text: 

 home

 to

 some

 of

 the

 world

's

 most

 iconic

 landmarks

,

 including

 the

 E

iff

el

 Tower

 and

 the

 Lou

vre

.

 Paris

 is

 also

 known

 for

 its

 fashion

,

 cuisine

,

 and

 romantic

 atmosphere

.

 Visitors

 can

 stroll

 along

 the

 Se

ine

 River

,

 explore

 the

 Latin

 Quarter

,

 and

 visit

 the

 famous

 Notre

-D

ame

 Cathedral

.

 The

 city

 is

 also

 home

 to

 numerous

 museums

,

 galleries

,

 and

 performance

 venues

,

 offering

 a

 wide

 range

 of

 cultural

 experiences

.


Paris

 is

 a

 popular

 destination

 for

 tourists

,

 with

 over

 

23

 million

 visitors

 per

 year

.

 The

 city

 has

 a

 diverse

 range

 of

 accommodations

,

 from

 budget

-friendly

 host

els

 to

 luxury

 hotels

.

 Visitors

 can

 also

 enjoy

 the

 city

's

 many

 parks

 and

 gardens




Generated text: 

 uncertain

,

 but

 it

's

 likely

 to

 change

 the

 world

 in

 profound

 ways




The

 future

 of

 AI

 is

 uncertain

,

 but

 it

's

 likely

 to

 change

 the

 world

 in

 profound

 ways




The

 future

 of

 AI

 is

 uncertain

,

 but

 it

's

 likely

 to

 change

 the

 world

 in

 profound

 ways




The

 future

 of

 AI

 is

 uncertain

,

 but

 it

's

 likely

 to

 change

 the

 world

 in

 profound

 ways




Art

ificial

 intelligence

 (

AI

)

 is

 a

 rapidly

 evolving

 field

 that

 is

 likely

 to

 have

 a

 significant

 impact

 on

 our

 lives

 in

 the

 coming

 years

.

 While

 it

's

 difficult

 to

 predict

 exactly

 how

 AI

 will

 develop

 and

 be

 used

,

 there

 are

 several

 factors

 that

 suggest

 it

 will

 change

 the

 world

 in

 profound

 ways

.





### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print_highlight(f"\nPrompt: {prompt}")
        print_highlight(f"Generated text: {output['text']}")


asyncio.run(main())

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print_highlight(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())

Generated text: 

 Pam

 Don

ahoo

 and

 I

 am

 a

 teacher

 at

 North

 Bend

 Elementary

 School

.

 I

 have

 been

 teaching

 for

 

20

 years

,

 with

 

17

 of

 those

 years

 at

 NB

ES

.

 I

 have

 taught

 a

 variety

 of

 grades

,

 but

 my

 favorite

 is

 

4

th

 grade

.

 I

 love

 watching

 students

 grow

 and

 learn

 during

 this

 time

 and

 seeing

 the

 excitement

 and

 curiosity

 that

 comes

 with

 it

.


I

 am

 a

 graduate

 of

 The

 Ohio

 State

 University

,

 where

 I

 earned

 a

 Bachelor

's

 degree

 in

 Elementary

 Education

.

 I

 also

 have

 a

 Master

's

 degree

 in

 Educational

 Leadership

 from

 Malone

 University

.


I

 am

 married

 to

 my

 wonderful

 husband

,

 Mike

,

 and

 we

 have

 two

 grown

 children

,

 Rachel

 and

 Alex

.

 I




Generated text: 

 Paris

 and

 its

 currency

 is

 the

 Euro

.

 France

 is

 located

 in

 the

 western

 part

 of

 Europe

.

 The

 official

 language

 is

 French

.


France

 is

 a

 country

 with

 a

 rich

 history

,

 known

 for

 its

 culture

,

 art

,

 fashion

,

 cuisine

,

 and

 architecture

.

 France

 has

 a

 diverse

 geography

,

 with

 mountains

,

 rivers

,

 and

 beaches

.

 The

 country

 is

 home

 to

 some

 of

 the

 most

 famous

 landmarks

 in

 the

 world

,

 such

 as

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 and

 Notre

-D

ame

 Cathedral

.


France

 is

 a

 popular

 tourist

 destination

,

 attracting

 millions

 of

 visitors

 each

 year

.

 The

 country

 has

 a

 strong

 economy

,

 with

 a

 high

 standard

 of

 living

.

 France

 is

 also

 known

 for




Generated text: 

 about

 more

 than

 just

 smart

 machines

.

 It

's

 about

 creating

 a

 future

 where

 technology

 enhances

 human

 capabilities

,

 improves

 people

's

 lives

,

 and

 solves

 some

 of

 the

 world

's

 most

 pressing

 problems

.


In

 this

 book

,

 Kai

-F

u

 Lee

,

 a

 renowned

 AI

 expert

 and

 entrepreneur

,

 provides

 a

 comprehensive

 and

 accessible

 guide

 to

 the

 future

 of

 AI

.

 Lee

 explains

 how

 AI

 will

 affect

 various

 aspects

 of

 our

 lives

,

 including

 education

,

 employment

,

 healthcare

,

 transportation

,

 and

 politics

.


He

 also

 provides

 insights

 into

 the

 future

 of

 work

,

 including

 the

 impact

 of

 AI

 on

 jobs

 and

 the

 importance

 of

 lifelong

 learning

.

 Lee

 emphasizes

 the

 need

 for

 a

 human

-centered

 approach

 to

 AI

 development

,

 one

 that

 priorit

izes




In [6]:
llm.shutdown()