# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.15it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.23it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.80it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.45it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.43it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Tim and I am a writer. I love writing about a wide range of topics, from the paranormal and supernatural to science fiction and fantasy. I am also passionate about history, folklore, and mythology, and I enjoy writing about these subjects as well.
I am a firm believer that the line between reality and fantasy is often blurred, and that the things we consider to be myth or legend can often have a basis in truth. I believe that the world is full of mysteries and wonders, and that there is always more to discover and learn.
In my writing, I try to tap into this sense of wonder and curiosity, and to explore the
Prompt: The president of the United States is
Generated text:  going to be traveling to a foreign country on a trip. They are planning on being gone for an extended period of time, which would leave the United States with a leader vacuum. In that scenario, the vice president would likely be called upon to serve as acting president until the

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Mrs

.

 C

.

 and

 I

 am

 a

 professional

 teacher

 with

 over

 

8

 years

 of

 experience

.

 I

 have

 a

 degree

 in

 elementary

 education

 and

 a

 certification

 in

 special

 education

.

 I

 have

 taught

 students

 in

 grades

 K

-

6

 and

 have

 experience

 working

 with

 students

 who

 have

 a

 variety

 of

 learning

 styles

 and

 abilities

.


I

 am

 passionate

 about

 helping

 students

 develop

 a

 love

 of

 learning

 and

 reach

 their

 full

 potential

.

 I

 believe

 that

 every

 student

 learns

 differently

 and

 that

 it

 is

 my

 job

 as

 a

 teacher

 to

 find

 ways

 to

 engage

 and

 motivate

 each

 student

.


In

 my

 classroom

,

 I

 strive

 to

 create

 a

 positive

 and

 supportive

 learning

 environment

 where

 students

 feel

 safe

 to

 take

 risks

 and

 ask

 questions

.

 I

 use

 a



Prompt: The capital of France is
Generated text: 

 a

 city

 that

 has

 been

 the

 epic

enter

 of

 art

,

 culture

,

 fashion

,

 and

 culinary

 delights

 for

 centuries

.

 And

 yet

,

 every

 time

 you

 visit

 Paris

,

 you

 discover

 something

 new

,

 something

 that

 makes

 you

 fall

 in

 love

 with

 this

 city

 all

 over

 again

.

 It

’s

 not

 just

 the

 iconic

 landmarks

 like

 the

 E

iff

el

 Tower

 or

 Notre

 Dame

 Cathedral

 that

 make

 this

 city

 special

,

 but

 the

 charming

 neighborhoods

,

 the

 charming

 people

,

 the

 delicious

 food

,

 and

 the

 rich

 history

 that

 surrounds

 you

 at

 every

 turn

.

 Here

 are

 some

 of

 the

 top

 things

 to

 do

 and

 see

 in

 Paris

:


The

 Lou

vre

 Museum

 is

 one

 of

 the

 world

’s

 largest

 and

 most

 famous

 museums

,

 with

 a



Prompt: The future of AI is
Generated text: 

 in

 the

 context

 of

 human

 relationships




I

 believe

 that

 the

 future

 of

 AI

 will

 be

 shaped

 by

 its

 ability

 to

 navigate

 complex

 human

 relationships

 and

 emotions

.

 While

 AI

 has

 made

 tremendous

 progress

 in

 recent

 years

,

 it

 still

 struggles

 to

 understand

 and

 replicate

 the

 nuances

 of

 human

 interaction

.

 However

,

 as

 AI

 becomes

 increasingly

 integrated

 into

 our

 daily

 lives

,

 it

 will

 need

 to

 learn

 how

 to

 navigate

 the

 complexities

 of

 human

 relationships

 if

 it

 is

 to

 be

 truly

 useful

 and

 effective

.


One

 of

 the

 key

 challenges

 facing

 AI

 is

 understanding

 the

 context

 of

 human

 communication

.

 We

 often

 use

 idi

oms

,

 metaph

ors

,

 and

 other

 forms

 of

 figur

ative

 language

 that

 can

 be

 difficult

 for

 AI

 to

 interpret

.

 Furthermore

,




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Nik Kershaw and I'm a musician and songwriter. I'm best known for my hit song, 'Wouldn't It Be Good' which reached the number 4 in the UK Singles Chart in 1984.
I have always been interested in music and started playing the guitar at the age of 10. I formed a band, 'The Kershaw Brothers', and played gigs in my local area. We eventually disbanded and I went on to play in various bands and form 'The Kershaws' in 1976.
I released my debut single, 'Funk Me Slowly', in 1977,

Prompt: The capital of France is
Generated text:  a must-visit destination for any traveler. Paris is the City of Light, full of history, art, fashion, and romance. From the iconic Eiffel Tower to the artistic masterpieces at the Louvre, and from the charming streets of Montmartre to the fashion boutiques of the Champs-Élysées, there's something for everyone in Paris.
The city is a popular destination for honeymooners, couples celebrating special occasions, and families alike

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Chris

 and

 I

 am

 an

 IT

 professional

.

 I

 am

 excited

 to

 join

 this

 community

 and

 start

 sharing

 my

 knowledge

 and

 experiences

.

 I

 have

 been

 working

 in

 the

 IT

 field

 for

 over

 

10

 years

,

 and

 have

 a

 wide

 range

 of

 skills

 and

 expertise

.


I

 am

 looking

 forward

 to

 contributing

 to

 this

 community

 and

 learning

 from

 others

 as

 well

.

 I

 am

 particularly

 interested

 in

 discussing

 topics

 such

 as

 cybersecurity

,

 cloud

 computing

,

 and

 software

 development

.

 I

 am

 also

 interested

 in

 hearing

 about

 other

 people

's

 experiences

 and

 challenges

 in

 the

 IT

 field

.


In

 my

 free

 time

,

 I

 enjoy

 reading

 about

 new

 technologies

 and

 staying

 up

-to

-date

 with

 the

 latest

 industry

 trends

.

 I

 am

 also

 an

 avid

 gamer

 and



Prompt: The capital of France is
Generated text: 

 not

 Paris

 (

although

 many

 people

 think

 it

 is

).

 The

 capital

 of

 France

 is

 actually

 the

 city

 of

 Paris

,

 but

 that

 is

 not

 the

 same

 thing

.

 The

 capital

 of

 a

 country

 is

 the

 city

 where

 the

 country

's

 government

 is

 located

.

 The

 capital

 of

 France

 is

 Paris

,

 but

 the

 city

 of

 Paris

 is

 not

 the

 capital

 of

 France

;

 it

 is

 the

 city

 of

 the

 capital

 of

 France

.

 Okay

,

 that

's

 a

 bit

 confusing

,

 isn

't

 it

?


In

 France

,

 the

 capital

 is

 a

 city

 that

 is

 separate

 from

 the

 city

 of

 Paris

.

 The

 city

 where

 the

 French

 government

 is

 located

 is

 actually

 called

 Paris

,

 but

 it

 is

 located

 in

 the

 Î

le

-de

-F

rance

 region



Prompt: The future of AI is
Generated text: 

 in

 hybrid

 approaches

 that

 blend

 rule

-based

 systems

 with

 machine

 learning

.

 These

 hybrid

 approaches

 are

 more

 flexible

 and

 scalable

 than

 traditional

 rule

-based

 systems

,

 and

 more

 interpre

table

 and

 explain

able

 than

 traditional

 machine

 learning

 systems

.

 Here

 are

 some

 key

 aspects

 of

 hybrid

 approaches

 to

 AI

:



1

.

 

 **

Rule

-based

 systems

**:

 These

 are

 traditional

 AI

 systems

 that

 use

 pre

-defined

 rules

 to

 make

 decisions

.

 They

 are

 often

 used

 in

 applications

 where

 the

 rules

 are

 well

-defined

 and

 the

 problem

 is

 well

-under

stood

.



2

.

 

 **

Machine

 learning

**:

 These

 are

 AI

 systems

 that

 learn

 from

 data

 and

 improve

 their

 performance

 over

 time

.

 They

 are

 often

 used

 in

 applications

 where

 the

 rules

 are

 not

 well

-defined

 or




In [6]:
llm.shutdown()