# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.37it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.18it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.67it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.51it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.47it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:37,  1.70s/it]

  9%|▊         | 2/23 [00:02<00:19,  1.07it/s]

 13%|█▎        | 3/23 [00:02<00:12,  1.64it/s]

 17%|█▋        | 4/23 [00:02<00:08,  2.18it/s]

 22%|██▏       | 5/23 [00:02<00:06,  2.66it/s]

 26%|██▌       | 6/23 [00:03<00:05,  2.94it/s]

 30%|███       | 7/23 [00:03<00:04,  3.29it/s]

 35%|███▍      | 8/23 [00:03<00:04,  3.58it/s]

 39%|███▉      | 9/23 [00:03<00:03,  3.80it/s]

 43%|████▎     | 10/23 [00:03<00:03,  3.96it/s]

 48%|████▊     | 11/23 [00:04<00:02,  4.11it/s]

 52%|█████▏    | 12/23 [00:04<00:02,  4.23it/s]

 57%|█████▋    | 13/23 [00:04<00:02,  4.36it/s]

 61%|██████    | 14/23 [00:04<00:02,  4.42it/s]

 65%|██████▌   | 15/23 [00:05<00:01,  4.46it/s]

 70%|██████▉   | 16/23 [00:05<00:01,  4.49it/s]

 74%|███████▍  | 17/23 [00:05<00:01,  4.53it/s]

 78%|███████▊  | 18/23 [00:05<00:01,  4.47it/s]

 83%|████████▎ | 19/23 [00:05<00:00,  4.53it/s]

 87%|████████▋ | 20/23 [00:06<00:00,  4.58it/s]

 91%|█████████▏| 21/23 [00:06<00:00,  4.61it/s]

 96%|█████████▌| 22/23 [00:06<00:00,  4.65it/s]

100%|██████████| 23/23 [00:06<00:00,  4.70it/s]100%|██████████| 23/23 [00:06<00:00,  3.39it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Tony and I am a 43 year old male. I was diagnosed with a rare genetic disorder called Ehlers-Danlos syndrome (EDS) in 2014. EDS is a condition that affects the body's connective tissue, which provides support to the skin, bones, joints, and other organs. People with EDS are prone to joint instability, skin hyperextensibility, and tissue fragility.
After being diagnosed, I started experiencing worsening joint pain and swelling in my hands, elbows, hips, and knees. I also experienced skin problems, such as bruising and easy bleeding, as well as fatigue and digestive
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States. The president serves a four-year term and is responsible for appointing federal judges, including Supreme Court justices, and for conducting foreign policy. The president also has the power to veto legislation passed by Congress, although Congress can override

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Amanda

 and

 I

'm

 a

 

25

 year

 old

 teacher

.

 I

've

 always

 been

 fascinated

 by

 people

 and

 their

 stories

.

 I

 love

 learning

 about

 different

 cultures

 and

 experiences

,

 and

 I

'm

 always

 eager

 to

 share

 my

 own

.

 I

'm

 a

 bit

 of

 a

 hopeless

 romantic

,

 and

 I

 love

 anything

 that

 brings

 people

 together

.

 I

'm

 excited

 to

 start

 this

 blog

 and

 share

 my

 thoughts

 and

 experiences

 with

 you

!


A

manda

:

 Hi

,

 I

'm

 Amanda

.

 I

'm

 a

 

25

 year

 old

 teacher

 and

 I

'm

 so

 excited

 to

 start

 this

 blog

 and

 share

 my

 thoughts

 and

 experiences

 with

 you

.

 I

'm

 a

 bit

 of

 a

 hopeless

 romantic

 and

 I

 love

 learning

 about

 different

 cultures

 and

 experiences

.




Prompt: The capital of France is
Generated text: 

 located

 on

 the

 Se

ine

 River

 and

 is

 a

 city

 of

 grand

eur

 and

 beauty

.

 Paris

 is

 often

 called

 the

 City

 of

 Light

,

 and

 its

 artistic

,

 cultural

,

 and

 historical

 significance

 is

 undeniable

.

 The

 city

 is

 famous

 for

 its

 iconic

 landmarks

 like

 the

 E

iff

el

 Tower

,

 Notre

-D

ame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 The

 French

 capital

 is

 a

 popular

 destination

 for

 tourists

,

 artists

,

 and

 intellectuals

,

 and

 its

 influence

 on

 global

 culture

 is

 immense

.

 Paris

 is

 a

 city

 that

 embodies

 the

 idea

 of

 elegance

,

 sophistication

,

 and

 refinement

.


The

 city

 is

 also

 a

 hub

 for

 fashion

,

 cuisine

,

 and

 entertainment

,

 with

 world

-class

 restaurants

,

 museums

,

 and

 theaters

.

 The



Prompt: The future of AI is
Generated text: 

 already

 here

,

 and

 it

’s

 changing

 the

 way

 we

 live

 and

 work

.

 From

 virtual

 assistants

 like

 Siri

 and

 Alexa

 to

 self

-driving

 cars

 and

 personalized

 medicine

,

 AI

 is

 increasingly

 becoming

 an

 integral

 part

 of

 our

 daily

 lives

.

 But

 what

 exactly

 is

 AI

,

 and

 how

 does

 it

 work

?


Art

ificial

 intelligence

 (

AI

)

 refers

 to

 the

 development

 of

 computer

 systems

 that

 can

 perform

 tasks

 that

 would

 typically

 require

 human

 intelligence

,

 such

 as

:


Understanding

 natural

 language




Recogn

izing

 images

 and

 patterns




Making

 decisions

 and

 predictions




Learning

 from

 data

 and

 experience




AI

 systems

 use

 a

 range

 of

 techniques

,

 including

 machine

 learning

,

 deep

 learning

,

 and

 neural

 networks

,

 to

 process

 and

 analyze

 vast

 amounts

 of

 data




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Taylor and I'm an apprentice in the AV department at the HUB. I'll be posting updates on what I'm working on and my experiences as an apprentice. I'm excited to be a part of this team and I'm looking forward to learning and growing.
Currently, I'm working on organizing and testing the new shipment of equipment. This involves making sure everything is in working order, labeling it correctly, and making sure it's stored in the correct place. It's a lot of work, but it's worth it to make sure everything runs smoothly in the AV department.
I'm also learning about the different types of equipment and how

Prompt: The capital of France is
Generated text:  a city that is full of history, culture, and beauty. From the iconic Eiffel Tower to the charming streets of Montmartre, there's something for everyone in this vibrant city. Here are some of the top attractions to visit in Paris:
The Eiffel Tower: This iron lattice tower is one of the most recogni

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Angel

 and

 I

 am

 a

 natural

 born

 leader

,

 with

 a

 huge

 heart

 and

 a

 generous

 spirit

.

 I

 am

 passionate

 about

 helping

 others

,

 and

 I

 believe

 that

 everyone

 deserves

 to

 live

 a

 happy

 and

 fulfilling

 life

.


I

 was

 born

 in

 the

 Dominican

 Republic

 and

 moved

 to

 the

 United

 States

 when

 I

 was

 a

 child

.

 Growing

 up

,

 I

 experienced

 many

 challenges

,

 but

 my

 parents

 and

 community

 helped

 me

 to

 overcome

 them

.

 Their

 love

,

 support

,

 and

 guidance

 inst

illed

 in

 me

 a

 sense

 of

 resilience

,

 determination

,

 and

 compassion

.


As

 I

 grew

 older

,

 I

 realized

 that

 I

 wanted

 to

 make

 a

 difference

 in

 the

 world

.

 I

 started

 volunteering

 at

 local

 non

-profit

 organizations

,

 working

 with

 youth



Prompt: The capital of France is
Generated text: 

 the

 largest

 city

 in

 the

 country

,

 with

 a

 population

 of

 over

 

2

.

1

 million

 people

.

 Paris

 is

 known

 for

 its

 stunning

 architecture

,

 art

 museums

,

 and

 romantic

 atmosphere

.

 The

 city

 is

 home

 to

 many

 famous

 landmarks

,

 including

 the

 E

iff

el

 Tower

,

 Notre

-D

ame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.


Paris

 is

 a

 global

 center

 for

 fashion

,

 cuisine

,

 and

 culture

.

 The

 city

 is

 a

 popular

 destination

 for

 tourists

,

 attracting

 over

 

23

 million

 visitors

 per

 year

.

 The

 city

 is

 also

 a

 major

 hub

 for

 international

 business

 and

 finance

,

 with

 many

 multinational

 corporations

 having

 their

 headquarters

 in

 Paris

.


Paris

 has

 a

 rich

 history

 dating

 back

 to

 the

 Roman

 era

,

 and



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 there

 are

 challenges

 ahead




From

 the

 first

 computers

 to

 the

 widespread

 use

 of

 the

 internet

,

 the

 development

 of

 technology

 has

 brought

 about

 numerous

 benefits

 to

 our

 society

.

 However

,

 with

 the

 rapid

 growth

 of

 Artificial

 Intelligence

 (

AI

),

 some

 experts

 are

 raising

 concerns

 about

 the

 potential

 risks

 and

 challenges

 associated

 with

 its

 increasing

 influence

.


The

 development

 of

 AI

 has

 seen

 significant

 advancements

 in

 recent

 years

,

 with

 breakthrough

s

 in

 areas

 such

 as

 machine

 learning

,

 natural

 language

 processing

,

 and

 computer

 vision

.

 These

 advancements

 have

 led

 to

 the

 creation

 of

 sophisticated

 AI

 systems

 that

 can

 learn

 from

 data

,

 make

 decisions

,

 and

 interact

 with

 humans

 in

 a

 more

 natural

 way

.


However

,

 as

 AI

 becomes

 more




In [6]:
llm.shutdown()