# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:01<00:03,  1.13s/it]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:02<00:02,  1.09s/it]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:03<00:01,  1.09s/it]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.27it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.11it/s]

  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:26,  1.20s/it]

  9%|▊         | 2/23 [00:01<00:13,  1.53it/s]

 13%|█▎        | 3/23 [00:01<00:08,  2.23it/s]

 17%|█▋        | 4/23 [00:01<00:06,  2.82it/s]

 22%|██▏       | 5/23 [00:02<00:05,  3.34it/s]

 26%|██▌       | 6/23 [00:02<00:04,  3.75it/s] 30%|███       | 7/23 [00:02<00:03,  4.22it/s]

 35%|███▍      | 8/23 [00:02<00:03,  4.53it/s] 39%|███▉      | 9/23 [00:02<00:02,  4.83it/s]

 43%|████▎     | 10/23 [00:03<00:02,  5.02it/s] 48%|████▊     | 11/23 [00:03<00:02,  5.14it/s]

 52%|█████▏    | 12/23 [00:03<00:02,  5.16it/s] 57%|█████▋    | 13/23 [00:03<00:01,  5.20it/s]

 61%|██████    | 14/23 [00:03<00:01,  5.26it/s]

 65%|██████▌   | 15/23 [00:03<00:01,  5.01it/s]

 70%|██████▉   | 16/23 [00:04<00:01,  4.70it/s]

 74%|███████▍  | 17/23 [00:04<00:01,  4.16it/s] 78%|███████▊  | 18/23 [00:04<00:01,  4.40it/s]

 83%|████████▎ | 19/23 [00:04<00:00,  4.41it/s]

 87%|████████▋ | 20/23 [00:05<00:00,  4.38it/s]

 91%|█████████▏| 21/23 [00:05<00:00,  4.19it/s]

 96%|█████████▌| 22/23 [00:05<00:00,  4.14it/s]

100%|██████████| 23/23 [00:05<00:00,  4.11it/s]100%|██████████| 23/23 [00:05<00:00,  3.87it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Emily and I'm a makeup artist and beauty blogger. I've been in the industry for a few years now and have worked with a variety of clients, from brides to fashion models. I love staying up to date on the latest trends and techniques, and I'm always looking for new products to try out and share with my followers.
I'm really passionate about helping people feel confident and beautiful in their own skin. Makeup can be a great way to enhance your natural features and boost your self-esteem, but it's also about having fun and expressing yourself through art. I believe that everyone deserves to look and feel their best, regardless of their
Prompt: The president of the United States is
Generated text:  not only the head of government but also the head of state. He is both the chief executive of the federal government and the representative of the United States in international relations. In this role, the president is responsible for setting the overa

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Rachel

 and

 I

 am

 a

 proud

 mother

 of

 two

 beautiful

 children

.

 I

've

 been

 working

 as

 a

 freelance

 writer

 for

 several

 years

 now

,

 and

 I

 have

 a

 passion

 for

 creating

 engaging

 content

 that

 reson

ates

 with

 my

 readers

.


As

 a

 parent

,

 I

 know

 how

 difficult

 it

 can

 be

 to

 find

 time

 for

 myself

,

 let

 alone

 pursue

 a

 career

.

 However

,

 I

've

 found

 that

 writing

 has

 been

 a

 wonderful

 outlet

 for

 me

,

 allowing

 me

 to

 express

 myself

 and

 connect

 with

 others

 in

 a

 meaningful

 way

.


My

 specialty

 is

 writing

 about

 parenting

,

 self

-care

,

 and

 personal

 development

.

 I

 believe

 that

 taking

 care

 of

 oneself

 is

 essential

 for

 being

 a

 good

 parent

,

 and

 I

 love

 sharing

 tips

 and



Prompt: The capital of France is
Generated text: 

 Paris

,

 the

 largest

 city

 in

 France

 and

 a

 global

 hub

 of

 art

,

 fashion

,

 cuisine

,

 and

 history

.

 The

 city

 is

 divided

 into

 

20

 arr

ond

isse

ments

 (

district

s

),

 each

 with

 its

 own

 unique

 character

 and

 attractions

.

 Paris

 is

 known

 for

 its

 iconic

 landmarks

 like

 the

 E

iff

el

 Tower

,

 Notre

-D

ame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

,

 as

 well

 as

 its

 romantic

 atmosphere

,

 charming

 neighborhoods

,

 and

 world

-class

 cuisine

.

 Visitors

 can

 explore

 the

 city

's

 many

 museums

,

 galleries

,

 and

 historical

 sites

,

 enjoy

 the

 beautiful

 parks

 and

 gardens

,

 or

 take

 a

 scenic

 river

 cruise

 along

 the

 Se

ine

.

 The

 city

 also

 offers

 a

 wide

 range

 of

 entertainment

 options



Prompt: The future of AI is
Generated text: 

 likely

 to

 be

 shaped

 by

 the

 way

 we

 approach

 innovation

,

 education

,

 and

 societal

 implications

 of

 this

 rapidly

 evolving

 technology

.

 As

 AI

 continues

 to

 transform

 industries

 and

 lives

,

 it

's

 essential

 to

 consider

 the

 potential

 risks

 and

 benefits

 and

 foster

 a

 culture

 of

 responsible

 AI

 development

.


Join

 us

 for

 a

 conversation

 with

 experts

 from

 industry

,

 academia

,

 and

 government

 as

 we

 discuss

 the

 future

 of

 AI

,

 its

 societal

 implications

,

 and

 the

 importance

 of

 responsible

 innovation

.

 Our

 panel

 will

 explore

 topics

 such

 as

:


The

 role

 of

 AI

 in

 shaping

 the

 future

 of

 work

 and

 education




The

 importance

 of

 ethics

 and

 accountability

 in

 AI

 development




The

 potential

 risks

 and

 benefits

 of

 AI

,

 including

 job

 displacement

 and

 bias







### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Olivia and I am a senior in high school. I am a very outgoing and confident individual who is passionate about helping others. I have been involved in many volunteer programs, such as the Food Bank, Habitat for Humanity, and the American Red Cross. I have also been involved in my school's debate team, and have helped to organize various community service projects.
I have a strong desire to attend college, but I am not sure which one to choose. I have been considering several universities, including UCLA, USC, and UC Berkeley. I would like to major in a field that will allow me to help others and make a positive impact on

Prompt: The capital of France is
Generated text:  famous for its stunning architecture, beautiful gardens, and of course, the Eiffel Tower. It's a city that's steeped in history and culture, and there's always something to see or do. Whether you're interested in art, fashion, food, or history, Paris has something for everyon

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Cec

elia

,

 and

 I

 have

 just

 joined

 the

 Indie

 Author

 Support

 Group

 on

 Facebook

.

 I

 have

 been

 writing

 for

 years

,

 but

 this

 is

 my

 first

 time

 to

 try

 to

 publish

 a

 book

.

 I

 am

 a

 bit

 overwhelmed

 by

 the

 process

.

 I

 have

 written

 a

 romance

 novel

,

 which

 I

 think

 is

 a

 great

 story

,

 but

 I

'm

 not

 sure

 if

 it

's

 good

 enough

 or

 if

 it

's

 even

 market

able

.


I

 would

 love

 to

 get

 feedback

 on

 my

 work

 and

 learn

 more

 about

 the

 publishing

 process

.

 I

'm

 a

 bit

 of

 a

 perfection

ist

,

 so

 I

'm

 worried

 about

 my

 book

 not

 being

 good

 enough

.

 I

've

 read

 a

 lot

 of

 romance

 novels

 and

 I

 think

 I



Prompt: The capital of France is
Generated text: 

 home

 to

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 and

 some

 of

 the

 world

's

 most

 fashionable

 shopping

 districts

.

 Paris

 is

 a

 beautiful

 and

 romantic

 city

 that

 offers

 endless

 options

 for

 travelers

,

 from

 world

-class

 museums

 to

 historic

 landmarks

 and

 charming

 cafes

.

 Whether

 you

're

 interested

 in

 history

,

 culture

,

 food

,

 or

 entertainment

,

 Paris

 has

 something

 for

 everyone

.


The

 E

iff

el

 Tower

 is

 an

 iconic

 symbol

 of

 Paris

 and

 one

 of

 the

 most

 recognizable

 landmarks

 in

 the

 world

.

 Built

 for

 the

 

188

9

 World

's

 Fair

,

 the

 tower

 stands

 at

 

324

 meters

 (

1

,

063

 feet

)

 tall

 and

 offers

 stunning

 views

 of

 the

 city

 from

 its

 observation

 decks

.


The



Prompt: The future of AI is
Generated text: 

 uncertain

,

 and

 it

 will

 undoubtedly

 be

 shaped

 by

 a

 complex

 inter

play

 of

 technological

,

 economic

,

 social

,

 and

 political

 factors

.

 This

 report

 provides

 a

 comprehensive

 overview

 of

 the

 key

 trends

,

 challenges

,

 and

 opportunities

 in

 the

 field

 of

 AI

,

 highlighting

 the

 need

 for

 a

 more

 nuanced

 and

 informed

 discussion

 about

 the

 future

 of

 AI

.


The

 report

 is

 based

 on

 a

 detailed

 review

 of

 the

 literature

,

 as

 well

 as

 insights

 from

 a

 range

 of

 experts

 and

 stakeholders

 in

 the

 field

 of

 AI

.

 It

 identifies

 the

 following

 key

 trends

 and

 challenges

:


1

.

 **

R

apid

 advances

 in

 AI

 technology

**:

 The

 report

 highlights

 the

 rapid

 pace

 of

 progress

 in

 AI

 research

 and

 development

,

 particularly

 in

 areas

 such

 as




In [6]:
llm.shutdown()