# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.17it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.25it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.83it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.47it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.46it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:35,  1.63s/it]

  9%|▊         | 2/23 [00:02<00:18,  1.12it/s] 13%|█▎        | 3/23 [00:02<00:11,  1.76it/s]

 17%|█▋        | 4/23 [00:02<00:07,  2.41it/s]

 22%|██▏       | 5/23 [00:02<00:06,  2.73it/s]

 26%|██▌       | 6/23 [00:03<00:06,  2.46it/s] 30%|███       | 7/23 [00:03<00:05,  2.96it/s]

 35%|███▍      | 8/23 [00:03<00:04,  3.39it/s]

 39%|███▉      | 9/23 [00:03<00:03,  3.69it/s]

 43%|████▎     | 10/23 [00:03<00:03,  3.98it/s] 48%|████▊     | 11/23 [00:04<00:02,  4.30it/s]

 52%|█████▏    | 12/23 [00:04<00:02,  4.21it/s]

 57%|█████▋    | 13/23 [00:04<00:02,  4.31it/s]

 61%|██████    | 14/23 [00:04<00:02,  4.40it/s]

 65%|██████▌   | 15/23 [00:05<00:01,  4.55it/s] 70%|██████▉   | 16/23 [00:05<00:01,  4.76it/s]

 74%|███████▍  | 17/23 [00:05<00:01,  4.90it/s] 78%|███████▊  | 18/23 [00:05<00:00,  5.00it/s]

 83%|████████▎ | 19/23 [00:05<00:00,  5.08it/s] 87%|████████▋ | 20/23 [00:05<00:00,  5.13it/s]

 91%|█████████▏| 21/23 [00:06<00:00,  5.19it/s] 96%|█████████▌| 22/23 [00:06<00:00,  5.21it/s]

100%|██████████| 23/23 [00:06<00:00,  5.19it/s]100%|██████████| 23/23 [00:06<00:00,  3.51it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Heather and I'm a book reviewer. My goal is to read and review a wide variety of books, and share my thoughts with the world.
I'll be reading and reviewing fiction and non-fiction, from romance and mystery to science fiction and non-fiction. I'll be featuring books from a variety of genres, authors, and publishers.
My reviews will be honest and fair, and I'll always strive to provide a balanced view of the book. I'll be sharing my thoughts on the story, characters, writing style, and any other aspects that stood out to me.
I'm excited to share my love of reading with you, and I
Prompt: The president of the United States is
Generated text:  an inherently absurd figure. He is both the most powerful and most powerless person in the world at the same time. He holds the reins of the most mighty military force in the history of the world and is responsible for some of the most influential decisions that will be made in the coming years. And yet, he 

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Thomas

 M

ews

 and

 I

 am

 a

 student

 at

 Ston

y

 Brook

 University

.

 I

 am

 major

ing

 in

 Environmental

 Science

 with

 a

 minor

 in

 Biology

.

 I

 have

 a

 strong

 passion

 for

 the

 environment

 and

 conservation

 and

 I

 am

 excited

 to

 be

 a

 part

 of

 the

 research

 team

 here

 at

 the

 Southampton

 Marine

 Station

.

 My

 research

 interests

 include

 understanding

 the

 impact

 of

 ocean

 acid

ification

 on

 marine

 ecosystems

,

 studying

 the

 effects

 of

 climate

 change

 on

 coastal

 ecosystems

,

 and

 exploring

 the

 relationships

 between

 marine

 species

 and

 their

 habitats

.

 I

 am

 excited

 to

 learn

 more

 about

 the

 research

 being

 conducted

 here

 and

 to

 contribute

 to

 the

 knowledge

 and

 understanding

 of

 the

 marine

 environment

.


I

 am

 a

 junior

 at

 Ston

y

 Brook

 University

,

 where



Prompt: The capital of France is
Generated text: 

 a

 place

 that

 has

 captured

 the

 hearts

 of

 people

 around

 the

 world

,

 and

 for

 good

 reason

.

 The

 City

 of

 Light

 is

 a

 place

 where

 fashion

,

 art

,

 history

,

 and

 culture

 all

 blend

 together

 in

 a

 stunning

 display

 of

 elegance

 and

 sophistication

.


One

 of

 the

 most

 famous

 landmarks

 in

 Paris

 is

 the

 E

iff

el

 Tower

,

 a

 

324

-meter

-t

all

 iron

 lattice

 tower

 built

 for

 the

 

188

9

 World

’s

 Fair

.

 You

 can

 take

 the

 stairs

 or

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


The

 Lou

vre

 Museum

 is

 another

 iconic

 Paris

ian

 landmark

,

 housing

 some

 of

 the

 world

’s

 most

 famous

 artworks

,

 including

 the

 Mona

 Lisa

.

 The

 museum

 is

 located

 in



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 also

 complex

 and

 multif

ac

eted

,

 with

 many

 areas

 of

 development

 that

 can

 be

 confusing

 or

 overwhelming

 to

 consider

.

 In

 this

 article

,

 we

 will

 discuss

 the

 various

 types

 of

 AI

,

 their

 capabilities

,

 and

 applications

,

 and

 highlight

 some

 of

 the

 key

 trends

 and

 challenges

 in

 the

 field

.



##

 Types

 of

 AI





There

 are

 several

 types

 of

 AI

,

 each

 with

 its

 own

 strengths

 and

 weaknesses

:



1

.

 

 **

N

arrow

 or

 Weak

 AI

**:

 This

 type

 of

 AI

 is

 designed

 to

 perform

 a

 specific

 task

,

 such

 as

 playing

 chess

,

 recognizing

 faces

,

 or

 translating

 languages

.

 Narrow

 AI

 is

 trained

 on

 a

 specific

 dataset

 and

 is

 not

 capable

 of

 general

 reasoning

 or

 learning

.





### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Riley and I am a 3-year-old female Border Collie. I live with my humans who give me endless belly rubs and treats, but they also keep me on a tight leash – literally!
I love to herd, chase, and play fetch with my favorite ball. I’m also very good at agility and can run through tunnels and jump over obstacles with ease. My favorite thing in the world is to please my humans, so I try my best to learn new tricks and behaviors.
When I’m not busy being a good girl, I love to snuggle up in my favorite blanket and take long naps. I also enjoy

Prompt: The capital of France is
Generated text:  getting a new eco-friendly innovation. Paris, a city known for its high-end fashion, art, and culture, is now implementing a large-scale urban solar park that will power one of its neighborhoods. The park, which is expected to be completed by 2023, will be built on a 4.8-hectare rooftop in the 13th arrondissement and will have the capacity to generate 1 MW of e

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Laura

 and

 I

'm

 the

 founder

 of

 Sustainable

 Earth

 Initiative

 (

SE

I

).

 I

'm

 so

 excited

 to

 introduce

 you

 to

 our

 mission

 and

 goals

.

 SE

I

 is

 a

 non

-profit

 organization

 dedicated

 to

 promoting

 sustainable

 living

 and

 community

 development

 through

 education

,

 outreach

,

 and

 project

 implementation

.

 Our

 mission

 is

 to

 empower

 individuals

,

 communities

,

 and

 organizations

 to

 live

 more

 sustain

ably

,

 reduce

 their

 environmental

 footprint

,

 and

 foster

 a

 culture

 of

 environmental

 steward

ship

.



Our

 goals

 are

 ambitious

,

 but

 achievable

,

 and

 we

 believe

 that

 by

 working

 together

,

 we

 can

 create

 a

 better

 future

 for

 ourselves

 and

 for

 generations

 to

 come

.

 Here

 are

 some

 of

 the

 key

 objectives

 we

're

 working

 towards

:



*

  

 **

Education



Prompt: The capital of France is
Generated text: 

 one

 of

 the

 most

 visited

 cities

 in

 the

 world

.

 Paris

 has

 so

 much

 to

 offer

,

 from

 famous

 landmarks

 like

 the

 E

iff

el

 Tower

 and

 Notre

 Dame

,

 to

 world

-class

 museums

 like

 the

 Lou

vre

 and

 Or

say

,

 to

 beautiful

 parks

 and

 gardens

,

 and

 of

 course

,

 the

 fashion

 and

 cuisine

.


On

 a

 recent

 visit

 to

 Paris

,

 I

 had

 the

 opportunity

 to

 explore

 the

 city

 with

 a

 local

 guide

 and

 try

 some

 of

 its

 most

 famous

 foods

.

 Our

 first

 stop

 was

 the

 famous

 street

 food

 market

,

 Le

 Com

pt

oir

 du

 Rel

ais

,

 where

 we

 sampled

 a

 delicious

 cro

que

-m

ons

ieur

 (

a

 grilled

 ham

 and

 cheese

 sandwich

)

 and

 a

 plate

 of

 French

 fries

.




Prompt: The future of AI is
Generated text: 

 now

,

 and

 it

 is

 being

 driven

 by

 a

 new

 generation

 of

 innov

ators

.

 Meet

 the

 team

 behind

 the

 AI

 that

’s

 transforming

 industries

 and

 changing

 lives

.


There

 are

 many

 exciting

 developments

 in

 AI

 technology

,

 but

 few

 are

 as

 transformative

 as

 the

 work

 being

 done

 by

 the

 team

 at

 our

 company

.

 With

 a

 focus

 on

 innovation

 and

 customer

 success

,

 we

 are

 creating

 AI

 solutions

 that

 are

 helping

 businesses

 and

 individuals

 achieve

 their

 goals

 and

 improve

 their

 lives

.


Our

 team

 is

 comprised

 of

 experts

 from

 a

 variety

 of

 backgrounds

,

 each

 bringing

 their

 unique

 perspective

 and

 skill

set

 to

 the

 table

.

 We

 have

 a

 deep

 understanding

 of

 AI

 and

 its

 many

 applications

,

 and

 we

 are

 committed

 to

 staying

 at

 the

 forefront




In [6]:
llm.shutdown()