# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.16it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.06it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.06it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.43it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.28it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Jake and I am a huge fan of video games and pop culture. I have been playing games since I was a kid and have been a fan of anime, manga, and comics for as long as I can remember. I love talking about all things gaming and pop culture, and I'm always down to discuss the latest news and releases.
When I'm not writing for Gaming Buddy, you can find me streaming on Twitch or playing the latest games on my console. I'm a bit of a completionist, so you can bet I'll be trying to 100% every game I play.
I'm always looking to learn more about the
Prompt: The president of the United States is
Generated text:  not immune to the law, and can be charged with crimes, a federal appeals court has ruled.
The decision, made by the 2nd US Circuit Court of Appeals, was made in the case of Donald Trump, who is accused of violating campaign finance laws in 2016.
The court's decision states that the president is not above the law and can be held accountable for the

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Mike

.

 I

'm

 a

 

42

 year

 old

 male

,

 and

 I

've

 been

 suffering

 from

 anxiety

 and

 depression

 for

 a

 number

 of

 years

.

 I

'm

 currently

 taking

 medication

,

 which

 seems

 to

 help

,

 but

 I

'm

 finding

 it

 increasingly

 difficult

 to

 deal

 with

 anxiety

 in

 certain

 situations

.

 I

'm

 starting

 to

 feel

 like

 I

'm

 losing

 control

 over

 my

 anxiety

,

 and

 it

's

 impacting

 my

 relationships

 and

 daily

 life

.

 I

 was

 hoping

 to

 find

 some

 support

 and

 guidance

 from

 someone

 who

 understands

 what

 I

'm

 going

 through

.


I

'm

 not

 sure

 where

 to

 start

 or

 what

 to

 expect

,

 but

 I

'm

 hoping

 to

 find

 a

 way

 to

 manage

 my

 anxiety

 and

 live

 a

 more

 fulfilling

 life

.

 I

'm



Prompt: The capital of France is
Generated text: 

 getting

 a

 new

 feature

:

 a

 real

-time

 air

 quality

 monitoring

 system

.

 Paris

 is

 one

 of

 the

 most

 polluted

 cities

 in

 the

 world

,

 with

 a

 significant

 number

 of

 citizens

 suffering

 from

 respiratory

 problems

 due

 to

 the

 high

 levels

 of

 air

 pollution

.

 The

 new

 system

 aims

 to

 monitor

 the

 air

 quality

 in

 real

-time

,

 helping

 citizens

 to

 make

 informed

 decisions

 about

 their

 health

 and

 daily

 activities

.


The

 system

,

 developed

 by

 the

 City

 of

 Paris

 and

 the

 University

 of

 Paris

-S

acl

ay

,

 uses

 a

 network

 of

 sensors

 to

 monitor

 air

 quality

 in

 real

-time

.

 The

 sensors

 are

 placed

 throughout

 the

 city

,

 providing

 detailed

 information

 on

 the

 levels

 of

 pollutants

 such

 as

 partic

ulate

 matter

,

 nitrogen

 dioxide

,

 and

 ozone



Prompt: The future of AI is
Generated text: 

 being

 shaped

 by

 the

 relentless

 pursuit

 of

 innovation

 and

 the

 emergence

 of

 new

 technologies

,

 which

 will

 undoubtedly

 transform

 the

 industry

 in

 the

 coming

 years

.

 Here

 are

 some

 of

 the

 key

 trends

 and

 advancements

 that

 will

 shape

 the

 future

 of

 AI

:



1

.

 

 **

Increased

 Adoption

 of

 Edge

 AI

**:

 As

 IoT

 devices

 and

 machines

 become

 increasingly

 prevalent

,

 Edge

 AI

 will

 play

 a

 critical

 role

 in

 processing

 and

 analyzing

 data

 in

 real

-time

,

 reducing

 latency

 and

 improving

 efficiency

.



2

.

 

 **

Adv

ances

 in

 Explain

able

 AI

 (

X

AI

)**

:

 The

 need

 for

 transparent

 and

 explain

able

 AI

 models

 will

 continue

 to

 grow

,

 driving

 innovation

 in

 X

AI

.

 This

 will

 enable

 organizations

 to

 trust

 and

 understand

 AI




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Jeanette Ammerlahn. I'm a senior at the University of Florida. I'm a member of the American Marketing Association (AMA) and I'm excited to be interning at a marketing agency this summer. I'm a business major with a minor in Psychology.
I'm originally from New Jersey, but I moved to Florida when I was in high school. I love the sunshine state and all the opportunities it has to offer. I'm a bit of a beach bum and love spending time by the water.
In my free time, I love to travel, try new foods, and attend concerts and festivals. I'm a bit

Prompt: The capital of France is
Generated text:  a city of romance and history. It is home to the Eiffel Tower, a symbol of the city and one of the most iconic landmarks in the world. The city also boasts a rich cultural heritage, with numerous museums, galleries, and historic buildings.
Paris has been a major city for centuries, and it has played a significant role in the history of Europe. It has been the

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Fr

éd

éric

 and

 I

'm

 a

 

35

 year

 old

 Belgian

 guy

,

 living

 in

 the

 beautiful

 city

 of

 Li

ège

 (

Bel

gium

).

 I

'm

 a

 creative

 and

 curious

 person

,

 always

 looking

 for

 new

 challenges

 and

 experiences

.


In

 my

 free

 time

,

 I

 enjoy

 playing

 guitar

,

 hiking

,

 cooking

,

 practicing

 photography

,

 and

 learning

 new

 things

 (

usually

 related

 to

 science

,

 technology

 or

 philosophy

).


I

 have

 a

 passion

 for

 languages

 and

 cultures

,

 and

 I

'm

 fluent

 in

 

3

 languages

 (

D

utch

,

 French

,

 and

 English

)

 and

 have

 a

 good

 understanding

 of

 Spanish

.


After

 working

 for

 a

 few

 years

 in

 the

 IT

 sector

,

 I

 decided

 to

 take

 a

 break

 and

 pursue

 my

 passion



Prompt: The capital of France is
Generated text: 

 a

 city

 like

 no

 other

,

 a

 city

 that

 is

 steep

ed

 in

 history

,

 culture

,

 and

 romance

.

 From

 the

 E

iff

el

 Tower

 to

 the

 Lou

vre

,

 there

 are

 so

 many

 amazing

 things

 to

 see

 and

 do

 in

 Paris

.

 Here

 are

 some

 of

 the

 top

 attractions

 and

 experiences

 you

 won

’t

 want

 to

 miss

 when

 visiting

 this

 beautiful

 city

.


The

 E

iff

el

 Tower

 is

 an

 iconic

 symbol

 of

 Paris

 and

 one

 of

 the

 most

 recognizable

 landmarks

 in

 the

 world

.

 The

 tower

 was

 built

 for

 the

 

188

9

 World

’s

 Fair

 and

 stands

 

324

 meters

 tall

.

 You

 can

 take

 the

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


The

 Lou

vre

 is

 one

 of

 the



Prompt: The future of AI is
Generated text: 

 vast

 and

 complex

.

 Some

 experts

 predict

 that

 AI

 will

 reach

 a

 point

 where

 it

 becomes

 difficult

 to

 distinguish

 between

 human

 and

 artificial

 intelligence

.

 While

 others

 believe

 that

 AI

 will

 become

 increasingly

 integrated

 into

 our

 daily

 lives

,

 making

 it

 easier

 to

 interact

 with

 technology

.

 Whatever

 the

 future

 may

 hold

,

 one

 thing

 is

 certain

:

 AI

 will

 have

 a

 profound

 impact

 on

 various

 industries

 and

 aspects

 of

 society

.


Art

ificial

 intelligence

 is

 a

 rapidly

 evolving

 field

 that

 has

 the

 potential

 to

 revolution

ize

 the

 way

 we

 live

 and

 work

.

 Here

 are

 some

 potential

 benefits

 of

 AI

:


Improved

 Efficiency

:

 AI

 can

 automate

 repetitive

 tasks

,

 freeing

 up

 human

 resources

 for

 more

 strategic

 and

 creative

 work

.

 This

 can

 lead

 to

 increased

 productivity




In [6]:
llm.shutdown()