# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.37it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.18it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.67it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.51it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.47it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:24,  1.09s/it]

  9%|▊         | 2/23 [00:01<00:13,  1.52it/s]

 13%|█▎        | 3/23 [00:01<00:09,  2.04it/s]

 17%|█▋        | 4/23 [00:02<00:07,  2.44it/s]

 22%|██▏       | 5/23 [00:02<00:06,  2.77it/s]

 26%|██▌       | 6/23 [00:02<00:05,  3.10it/s]

 30%|███       | 7/23 [00:02<00:04,  3.45it/s]

 35%|███▍      | 8/23 [00:03<00:04,  3.46it/s]

 39%|███▉      | 9/23 [00:03<00:03,  3.64it/s]

 43%|████▎     | 10/23 [00:03<00:03,  3.59it/s]

 48%|████▊     | 11/23 [00:03<00:03,  3.37it/s]

 52%|█████▏    | 12/23 [00:04<00:03,  3.22it/s]

 57%|█████▋    | 13/23 [00:04<00:03,  3.31it/s]

 61%|██████    | 14/23 [00:04<00:02,  3.51it/s]

 65%|██████▌   | 15/23 [00:05<00:02,  3.77it/s]

 70%|██████▉   | 16/23 [00:05<00:01,  4.02it/s]

 74%|███████▍  | 17/23 [00:05<00:01,  4.12it/s]

 78%|███████▊  | 18/23 [00:05<00:01,  3.44it/s]

 83%|████████▎ | 19/23 [00:06<00:01,  2.96it/s]

 87%|████████▋ | 20/23 [00:06<00:00,  3.04it/s]

 91%|█████████▏| 21/23 [00:06<00:00,  3.36it/s]

 96%|█████████▌| 22/23 [00:07<00:00,  3.46it/s]

100%|██████████| 23/23 [00:07<00:00,  2.44it/s]100%|██████████| 23/23 [00:07<00:00,  2.95it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Tiffany and I'm a wife, mother, teacher and blogger. Welcome to my blog, where I share my thoughts on parenting, teaching, marriage and life in general.
I started this blog as a way to share my experiences and connect with others who may be going through similar challenges and joys. As a teacher and mother, I've learned a thing or two about navigating the ups and downs of life, and I hope to share some of that wisdom with you.
On this blog, you can expect to find posts about parenting, teaching, marriage, and self-care. I'll also be sharing recipes, book reviews, and other fun stuff
Prompt: The president of the United States is
Generated text:  tasked with the duty of making sure the government functions properly. To do this, the president must work closely with Congress and the federal courts to pass laws and enforce the laws that are already on the books. The president is also responsible for commanding the armed forces and for negotiating t

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 I

sha

an

 and

 I

 am

 a

 

4

th

 grade

 student

 at

 Spring

dale

 Elementary

.

 I

 love

 learning

 about

 science

,

 math

,

 and

 reading

.

 I

 also

 love

 playing

 video

 games

 and

 drawing

.

 I

 am

 excited

 to

 be

 a

 part

 of

 the

 Robotics

 club

 and

 learn

 about

 robotics

 and

 coding

.

 I

 am

 a

 bit

 nervous

,

 but

 I

 am

 sure

 I

 will

 learn

 a

 lot

 and

 have

 fun

.

 I

 am

 looking

 forward

 to

 working

 with

 my

 teammates

 and

 creating

 something

 amazing

.

 Can

't

 wait

 to

 get

 started

!

 

 -

 I

sha

an




Hello

,

 my

 name

 is

 I

sha

an

 and

 I

 am

 a

 

4

th

 grade

 student

 at

 Spring

dale

 Elementary

.

 I

 love

 learning

 about

 science



Prompt: The capital of France is
Generated text: 

 a

 city

 that

 is

 steep

ed

 in

 history

,

 art

,

 fashion

,

 and

 culture

.

 A

 city

 that

 has

 been

 the

 backdrop

 for

 some

 of

 the

 most

 pivotal

 moments

 in

 world

 history

.

 From

 the

 rise

 and

 fall

 of

 em

pires

 to

 the

 birth

 of

 the

 modern

 era

,

 Paris

 has

 seen

 it

 all

.

 As

 the

 most

 visited

 city

 in

 the

 world

,

 Paris

 is

 a

 must

-

visit

 destination

 for

 anyone

 who

 is

 interested

 in

 history

,

 art

,

 fashion

,

 or

 simply

 experiencing

 the

 unique

 jo

ie

 de

 viv

re

 that

 the

 city

 has

 to

 offer

.


The

 city

 is

 home

 to

 some

 of

 the

 world

's

 most

 famous

 landmarks

,

 including

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 Notre



Prompt: The future of AI is
Generated text: 

 in

 the

 hands

 of

 young

 people




A

 recent

 survey

 by

 Oracle

 and

 Wake

field

 Research

 found

 that

 

74

%

 of

 

1

,

500

 US

 teenagers

 aged

 

13

 to

 

18

 believe

 they

 have

 the

 skills

 to

 work

 with

 AI

.


The

 survey

 also

 found

 that

 

64

%

 of

 teens

 consider

 AI

 to

 be

 an

 exciting

 and

 empowering

 technology

,

 while

 

56

%

 believe

 they

 will

 be

 able

 to

 use

 AI

 to

 change

 the

 world

.


These

 findings

 suggest

 that

 young

 people

 are

 enthusiastic

 and

 optimistic

 about

 the

 potential

 of

 AI

,

 and

 that

 they

 believe

 they

 have

 the

 skills

 to

 harness

 its

 power

.


This

 is

 a

 positive

 development

,

 as

 AI

 is

 likely

 to

 play

 an

 increasingly

 important

 role

 in

 shaping




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Laddie. I am a 2 year old, black lab mix. I was surrendered to the shelter and I am now looking for a new forever home. I love people and enjoy being around them. I am a bit shy at first, but once I get to know you, I become very friendly. I love to play and go on walks, but I'm still a puppy and need to work on my leash skills. I am very smart and eager to learn. I would do best in a home with a patient owner who will give me the time and love I need to become the best dog I can be.
If you

Prompt: The capital of France is
Generated text:  Paris and it is famous for its romantic atmosphere, world-class museums, and fashion. However, it is not the most affordable place to live. The cost of living in Paris is relatively high, especially when it comes to housing, food, and transportation.
According to Numbeo, a cost of living index, the cost of living in Paris is 114.2, which is higher than the global average of 100. This means that Paris is ab

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 De

bra

 and

 I

 am

 a

 

42

 year

 old

 woman

 who

 has

 been

 living

 with

 diabetes

 for

 over

 

20

 years

.

 I

 have

 had

 my

 share

 of

 ups

 and

 downs

 with

 this

 disease

,

 but

 I

 have

 learned

 to

 manage

 it

 and

 live

 a

 healthy

,

 balanced

 lifestyle

.

 I

 am

 here

 to

 share

 my

 story

 and

 offer

 advice

 and

 support

 to

 anyone

 who

 is

 living

 with

 diabetes

.


I

 was

 diagnosed

 with

 diabetes

 when

 I

 was

 

22

 years

 old

.

 I

 was

 in

 college

 at

 the

 time

 and

 was

 working

 part

-time

 at

 a

 restaurant

.

 I

 was

 always

 busy

 and

 didn

't

 pay

 much

 attention

 to

 my

 eating

 habits

,

 and

 as

 a

 result

,

 I

 developed

 a

 weight

 problem

.

 My

 doctor



Prompt: The capital of France is
Generated text: 

 a

 city

 that

 needs

 no

 introduction

.

 From

 the

 iconic

 E

iff

el

 Tower

 to

 the

 art

 museums

 and

 galleries

,

 the

 fashion

,

 food

,

 and

 romance

 that

 perv

ades

 every

 corner

 of

 the

 city

,

 Paris

 has

 something

 for

 everyone

.

 Here

’s

 a

 brief

 overview

 of

 Paris

 and

 the

 many

 attractions

 it

 has

 to

 offer

.


Paris

,

 the

 capital

 of

 France

,

 is

 a

 global

 hub

 for

 art

,

 fashion

,

 cuisine

,

 and

 culture

.

 The

 city

 is

 divided

 into

 

20

 arr

ond

isse

ments

 (

neighbor

hood

s

),

 each

 with

 its

 own

 unique

 character

 and

 attractions

.

 From

 the

 historic

 heart

 of

 the

 city

,

 which

 includes

 the

 Lou

vre

 Museum

,

 Notre

-D

ame

 Cathedral

,

 and

 the

 Ch



Prompt: The future of AI is
Generated text: 

 to

 mimic

 nature




The

 most

 effective

 AI

 systems

 will

 be

 the

 ones

 that

 understand

 and

 mirror

 the

 natural

 world




Art

ificial

 intelligence

 (

AI

)

 is

 on

 a

 trajectory

 to

 revolution

ize

 various

 sectors

,

 from

 healthcare

 and

 finance

 to

 transportation

 and

 education

.

 As

 AI

 continues

 to

 evolve

,

 researchers

 are

 exploring

 ways

 to

 create

 more

 sophisticated

 and

 efficient

 systems

 that

 can

 learn

,

 adapt

,

 and

 interact

 with

 humans

 in

 a

 more

 natural

 way

.

 One

 approach

 gaining

 traction

 is

 to

 design

 AI

 systems

 that

 mimic

 the

 principles

 of

 nature

.


This

 concept

 is

 often

 referred

 to

 as

 "

bi

om

im

ic

ry

"

 or

 "

bio

-inspired

 AI

."

 By

 studying

 the

 intricate

 patterns

,

 processes

,

 and

 behaviors

 of

 living

 organisms




In [6]:
llm.shutdown()