# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  5.51it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.66it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.31it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.18it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.33it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:33,  1.54s/it]

  9%|▊         | 2/23 [00:01<00:17,  1.19it/s] 13%|█▎        | 3/23 [00:02<00:10,  1.85it/s]

 17%|█▋        | 4/23 [00:02<00:07,  2.50it/s] 22%|██▏       | 5/23 [00:02<00:05,  3.10it/s]

 26%|██▌       | 6/23 [00:02<00:04,  3.54it/s] 30%|███       | 7/23 [00:02<00:04,  3.98it/s]

 35%|███▍      | 8/23 [00:03<00:03,  4.35it/s] 39%|███▉      | 9/23 [00:03<00:03,  4.62it/s]

 43%|████▎     | 10/23 [00:03<00:02,  4.83it/s] 48%|████▊     | 11/23 [00:03<00:02,  4.99it/s]

 52%|█████▏    | 12/23 [00:03<00:02,  5.03it/s] 57%|█████▋    | 13/23 [00:03<00:01,  5.15it/s]

 61%|██████    | 14/23 [00:04<00:01,  5.21it/s] 65%|██████▌   | 15/23 [00:04<00:01,  5.25it/s]

 70%|██████▉   | 16/23 [00:04<00:01,  5.25it/s] 74%|███████▍  | 17/23 [00:04<00:01,  5.27it/s]

 78%|███████▊  | 18/23 [00:04<00:00,  5.27it/s] 83%|████████▎ | 19/23 [00:05<00:00,  5.31it/s]

 87%|████████▋ | 20/23 [00:05<00:00,  5.31it/s] 91%|█████████▏| 21/23 [00:05<00:00,  5.32it/s]

 96%|█████████▌| 22/23 [00:05<00:00,  5.34it/s]100%|██████████| 23/23 [00:05<00:00,  5.34it/s]100%|██████████| 23/23 [00:05<00:00,  3.94it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  David, and I'm a blogger, a traveler, and a foodie. I'm currently based in Berlin, Germany, but I've lived in many different cities around the world, including Bangkok, Singapore, and London.
I started this blog as a way to share my experiences and adventures with the world, and to inspire others to explore new places and try new things. On this blog, you'll find posts about my travels, my favorite foods, and my thoughts on culture and society.
I'm a big fan of street food, and I love trying new and exotic dishes. I've eaten my way through Bangkok's street food scene
Prompt: The president of the United States is
Generated text:  the head of state and head of government for the United States. The president is elected by the citizens of the United States through the Electoral College. The president serves a four-year term and is limited to two terms.
The president's role includes appointing federal judges, including Supreme Court justices, and o

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Byron

 and

 I

'm

 a

 proud

 owner

 of

 a

 

200

1

 Toyota

 

4

Runner

.

 My

 vehicle

 has

 been

 my

 reliable

 companion

 for

 many

 years

,

 and

 I

've

 been

 trying

 to

 maintain

 it

 in

 the

 best

 condition

 possible

.

 However

,

 I

've

 encountered

 a

 problem

 that

 has

 been

 nag

ging

 me

 for

 some

 time

 now

,

 and

 I

'm

 hoping

 that

 someone

 here

 might

 be

 able

 to

 offer

 some

 advice

.



Recently

,

 I

've

 been

 noticing

 that

 my

 

4

Runner

's

 engine

 is

 making

 a

 strange

 noise

,

 which

 I

've

 described

 as

 a

 grinding

 or

 sc

ree

ching

 sound

.

 It

's

 not

 a

 constant

 noise

,

 but

 rather

 it

 seems

 to

 occur

 when

 I

'm

 accelerating

 from

 a

 stand

still



Prompt: The capital of France is
Generated text: 

 Paris

,

 which

 is

 located

 in

 the

 northern

 part

 of

 the

 country

.

 The

 city

 is

 famous

 for

 its

 beautiful

 architecture

,

 art

 museums

,

 and

 cultural

 landmarks

 such

 as

 the

 E

iff

el

 Tower

 and

 Notre

-D

ame

 Cathedral

.

 Paris

 is

 also

 known

 for

 its

 fashion

,

 cuisine

,

 and

 romantic

 atmosphere

.


France

 is

 a

 country

 located

 in

 Western

 Europe

,

 bordered

 by

 the

 countries

 of

 Belgium

,

 Luxembourg

,

 Germany

,

 Switzerland

,

 Italy

,

 Spain

,

 and

 And

orra

.

 It

 is

 a

 popular

 tourist

 destination

 known

 for

 its

 rich

 history

,

 culture

,

 and

 natural

 beauty

.

 France

 is

 home

 to

 many

 famous

 cities

,

 including

 Paris

,

 Lyon

,

 Marseille

,

 and

 Bordeaux

,

 as

 well

 as

 the

 French

 Riv

iera



Prompt: The future of AI is
Generated text: 

 changing

 the

 way

 we

 live

,

 work

,

 and

 interact

 with

 each

 other

.

 Artificial

 intelligence

 (

AI

)

 is

 being

 used

 in

 a

 wide

 range

 of

 industries

,

 from

 healthcare

 and

 finance

 to

 transportation

 and

 education

.

 In

 this

 article

,

 we

 will

 explore

 the

 impact

 of

 AI

 on

 the

 future

 of

 work

 and

 how

 it

 will

 shape

 the

 labor

 market

.


Impact

 of

 AI

 on

 the

 Future

 of

 Work




AI

 has

 the

 potential

 to

 automate

 many

 tasks

 and

 processes

,

 which

 could

 lead

 to

 significant

 changes

 in

 the

 labor

 market

.

 Some

 of

 the

 key

 impacts

 of

 AI

 on

 the

 future

 of

 work

 include

:


1

.

 Job

 Dis

placement

:

 AI

 has

 the

 potential

 to

 automate

 many

 jobs

,

 particularly

 those

 that

 involve

 repetitive




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Veronika and I am a traveler, photographer and blogger. I'm passionate about exploring new places, meeting new people and experiencing different cultures.
I love to share my stories and photographs with others, so they can get a glimpse of the amazing world we live in. My aim is to inspire people to travel, to try new things and to appreciate the beauty in everyday life.
I'm originally from Austria, but I've been living in various countries, including Australia, the United States, and now in New Zealand. I'm a curious and adventurous person, always looking for new opportunities to explore and learn.
This blog is where I share

Prompt: The capital of France is
Generated text:  a city of great beauty and history, and there is no shortage of things to do and see. Paris is famous for its stunning architecture, art museums, fashion, and cuisine, and there is always something new to discover. From the iconic Eiffel Tower to the Louvre Museum, Notre

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Kirst

y

 and

 I

’m

 a

 hair

dress

er

 at

 W

igs

 N

 L

ashes

 Salon

 in

 North

 West

 England

.

 I

 have

 over

 

5

 years

 experience

 in

 hair

d

ressing

 and

 specialize

 in

 wig

 styling

,

 hair

 extensions

,

 and

 makeup

.


I

 love

 the

 way

 a

 wig

 can

 transform

 a

 person

’s

 look

 and

 boost

 their

 confidence

,

 which

 is

 why

 I

 am

 passionate

 about

 my

 work

.

 I

 take

 pride

 in

 providing

 high

-quality

 services

 that

 meet

 my

 clients

'

 needs

 and

 expectations

.

 My

 friendly

 and

 approach

able

 nature

 means

 that

 I

 make

 sure

 my

 clients

 feel

 at

 ease

 and

 comfortable

 throughout

 their

 experience

.


I

 am

 trained

 in

 various

 wig

 styling

 techniques

,

 including

 hand

-t

ied

 and

 lace

 front

 w

igs



Prompt: The capital of France is
Generated text: 

 home

 to

 some

 of

 the

 world

’s

 most

 famous

 landmarks

,

 museums

,

 and

 art

 collections

,

 and

 is

 a

 popular

 destination

 for

 tourists

 and

 business

 travelers

 alike

.

 Visitors

 can

 explore

 the

 iconic

 E

iff

el

 Tower

,

 visit

 the

 Lou

vre

 Museum

 to

 see

 the

 Mona

 Lisa

,

 and

 stroll

 along

 the

 Se

ine

 River

.

 The

 city

 is

 also

 known

 for

 its

 fashion

,

 cuisine

,

 and

 romantic

 atmosphere

,

 making

 it

 a

 must

-

visit

 destination

 for

 anyone

 interested

 in

 history

,

 culture

,

 and

 luxury

.


Paris

 is

 also

 a

 hub

 for

 international

 business

 and

 finance

,

 with

 many

 multinational

 corporations

 and

 financial

 institutions

 having

 a

 presence

 in

 the

 city

.

 The

 city

 is

 home

 to

 the

 European

 Union

’s

 headquarters

 and

 is



Prompt: The future of AI is
Generated text: 

 being

 shaped

 by

 cutting

-edge

 research

 in

 various

 fields

 such

 as

 computer

 vision

,

 natural

 language

 processing

,

 robotics

,

 and

 deep

 learning

.

 These

 technologies

 have

 the

 potential

 to

 revolution

ize

 industries

 such

 as

 healthcare

,

 finance

,

 and

 transportation

,

 and

 transform

 the

 way

 we

 live

 and

 work

.

 However

,

 AI

 also

 raises

 important

 ethical

 concerns

,

 such

 as

 bias

,

 transparency

,

 and

 accountability

.

 Researchers

 and

 developers

 must

 carefully

 consider

 these

 issues

 as

 they

 create

 new

 AI

 systems

.


The

 future

 of

 AI

 is

 expected

 to

 be

 shaped

 by

 several

 key

 trends

,

 including

:


1

.

 Edge

 AI

:

 With

 the

 proliferation

 of

 IoT

 devices

 and

 the

 need

 for

 real

-time

 processing

,

 edge

 AI

 is

 becoming

 increasingly

 important

.

 Edge

 AI




In [6]:
llm.shutdown()