# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.33it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.15it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.62it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.47it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.43it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:37,  1.72s/it]

  9%|▊         | 2/23 [00:02<00:19,  1.07it/s]

 13%|█▎        | 3/23 [00:02<00:12,  1.65it/s]

 17%|█▋        | 4/23 [00:02<00:08,  2.21it/s]

 22%|██▏       | 5/23 [00:02<00:06,  2.72it/s]

 26%|██▌       | 6/23 [00:02<00:05,  3.07it/s]

 30%|███       | 7/23 [00:03<00:04,  3.42it/s]

 35%|███▍      | 8/23 [00:03<00:04,  3.69it/s]

 39%|███▉      | 9/23 [00:03<00:03,  3.91it/s]

 43%|████▎     | 10/23 [00:03<00:03,  4.06it/s]

 48%|████▊     | 11/23 [00:04<00:02,  4.16it/s]

 52%|█████▏    | 12/23 [00:04<00:02,  4.27it/s]

 57%|█████▋    | 13/23 [00:04<00:02,  4.34it/s]

 61%|██████    | 14/23 [00:04<00:02,  4.37it/s]

 65%|██████▌   | 15/23 [00:05<00:01,  4.40it/s]

 70%|██████▉   | 16/23 [00:05<00:01,  4.39it/s]

 74%|███████▍  | 17/23 [00:05<00:01,  4.44it/s]

 78%|███████▊  | 18/23 [00:05<00:01,  4.47it/s]

 83%|████████▎ | 19/23 [00:05<00:00,  4.50it/s]

 87%|████████▋ | 20/23 [00:06<00:00,  4.48it/s]

 91%|█████████▏| 21/23 [00:06<00:00,  4.49it/s]

 96%|█████████▌| 22/23 [00:06<00:00,  4.50it/s]

100%|██████████| 23/23 [00:06<00:00,  4.52it/s]100%|██████████| 23/23 [00:06<00:00,  3.39it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Kevin Mills and I am the pastor of Cornerstone Community Church in Jacksonville, Florida. I am excited to join the Orange Park community and look forward to meeting you and your family.
Cornerstone Community Church is a non-denominational church that is committed to teaching the truth of God's Word and applying it to everyday life. Our desire is to provide a welcoming and loving environment where people can come to know God and grow in their faith.
We believe that the church is not just a place to attend, but a community of believers who come together to support one another, to serve one another, and to worship together. We believe that every
Prompt: The president of the United States is
Generated text:  the head of the federal government. The president is directly elected by the people through the Electoral College. The president serves a four-year term and can be re-elected for a second term. The president is responsible for executing the la

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 El

ara

 and

 I

 am

 

25

 years

 old

.

 I

 am

 a

 college

 student

,

 studying

 psychology

 and

 sociology

.

 I

 love

 to

 learn

 about

 the

 human

 mind

 and

 behavior

,

 and

 I

 am

 fascinated

 by

 the

 complexities

 of

 human

 relationships

.


I

 have

 always

 been

 an

 empath

etic

 person

,

 and

 I

 enjoy

 helping

 others

 and

 listening

 to

 their

 stories

.

 I

 believe

 that

 everyone

 has

 a

 unique

 perspective

 and

 experience

,

 and

 I

 am

 always

 eager

 to

 learn

 from

 others

.


In

 my

 free

 time

,

 I

 love

 to

 read

,

 write

,

 and

 practice

 yoga

.

 I

 find

 that

 these

 activities

 help

 me

 to

 relax

 and

 center

 myself

,

 and

 they

 also

 give

 me

 a

 sense

 of

 clarity

 and

 purpose

.


I

 am

 looking



Prompt: The capital of France is
Generated text: 

 known

 for

 its

 history

,

 fashion

,

 art

,

 and

 cuisine

.

 But

 it

 also

 has

 a

 less

-t

our

isty

 side

,

 which

 is

 worth

 exploring

.

 Here

 are

 some

 hidden

 gems

 to

 discover

 in

 Paris

.


The

 Mar

ais

 neighborhood

,

 with

 its

 narrow

 streets

 and

 colorful

 buildings

,

 is

 a

 charming

 area

 to

 explore

.

 You

 can

 find

 unique

 bout

iques

,

 art

 galleries

,

 and

 cafes

 in

 this

 historic

 neighborhood

.


For

 a

 taste

 of

 

19

th

-century

 Paris

,

 head

 to

 the

 P

ère

 L

ach

aise

 Cemetery

.

 This

 famous

 cemetery

 is

 the

 final

 resting

 place

 of

 many

 famous

 artists

,

 writers

,

 and

 musicians

,

 including

 Oscar

 Wilde

 and

 Jim

 Morrison

.


If

 you

’re

 looking

 for

 a

 unique

 shopping



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 there

 are

 also

 challenges

 and

 concerns

 that

 need

 to

 be

 addressed




As

 artificial

 intelligence

 (

AI

)

 continues

 to

 evolve

 and

 become

 more

 pervasive

,

 it

 is

 likely

 to

 have

 a

 significant

 impact

 on

 various

 aspects

 of

 our

 lives

.

 The

 future

 of

 AI

 is

 bright

,

 with

 potential

 applications

 in

 areas

 such

 as

 healthcare

,

 education

,

 and

 transportation

.

 However

,

 there

 are

 also

 challenges

 and

 concerns

 that

 need

 to

 be

 addressed

 in

 order

 to

 ensure

 that

 AI

 is

 developed

 and

 deployed

 responsibly

.


Some

 of

 the

 challenges

 and

 concerns

 surrounding

 AI

 include

:


1

.

 Bias

 and

 fairness

:

 AI

 systems

 can

 perpet

uate

 existing

 biases

 and

 discrimination

 if

 they

 are

 trained

 on

 biased

 data

.

 This

 can

 lead

 to

 unfair




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Katharine Conover and I'm a sixth grader at Interlake Middle School in Bellevue, Washington. I'm excited to be working with my classmates on a project to bring more solar panels to our school, and I'm looking forward to learning more about renewable energy and sustainability.
We're using the "Appropedia" website to help us plan and research our project, and we're excited to share our progress and learn from others on the site. We're aiming to install solar panels on our school's roof to help reduce our energy consumption and carbon footprint. We believe that this project will not only help our school save money on

Prompt: The capital of France is
Generated text:  the city of Paris, which is situated in the northern part of the country. The city is the largest in France in terms of both population and economic output. Paris is home to several world-renowned landmarks, including the Eiffel Tower, the Louvre Museum, and Notre Dame Cathedral.
Th

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Crist

ian

,

 I

 am

 a

 Microsoft

 Certified

 Professional

 with

 over

 

10

 years

 of

 experience

 in

 the

 IT

 industry

.

 My

 expertise

 includes

:

 Microsoft

 Windows

 Server

,

 Microsoft

 Exchange

,

 Active

 Directory

,

 Microsoft

 SQL

 Server

,

 Cisco

 Network

 Devices

,

 Cisco

 Fire

walls

,

 and

 more

.

 I

 also

 have

 a

 strong

 background

 in

 cybersecurity

,

 threat

 analysis

 and

 incident

 response

.


I

 have

 worked

 as

 a

 senior

 network

 engineer

 and

 system

 administrator

 for

 a

 variety

 of

 companies

,

 including

 start

-ups

,

 small

 businesses

,

 and

 large

 corporations

.

 My

 role

 has

 included

 designing

,

 implementing

,

 and

 maintaining

 network

 infrastructure

,

 server

 environments

,

 and

 security

 systems

.


I

 am

 passionate

 about

 staying

 up

-to

-date

 with

 the

 latest

 technology

 trends

 and

 best



Prompt: The capital of France is
Generated text: 

 a

 city

 of

 love

,

 art

,

 fashion

,

 and

 cuisine

.

 But

 did

 you

 know

 that

 Paris

 also

 has

 a

 rich

 history

 of

 street

 art

?

 Here

 are

 some

 of

 the

 most

 famous

 street

 artists

 from

 Paris

,

 and

 their

 contributions

 to

 the

 city

's

 vibrant

 street

 art

 scene

.


1

.

 Ble

k

 le

 Rat




B

lek

 le

 Rat

 is

 a

 French

 street

 artist

 known

 for

 his

 stencil

 works

 that

 often

 feature

 rats

.

 He

 is

 considered

 one

 of

 the

 pioneers

 of

 street

 art

 in

 Paris

 and

 has

 been

 active

 since

 the

 

198

0

s

.

 His

 works

 can

 be

 found

 throughout

 the

 city

,

 from

 the

 Latin

 Quarter

 to

 the

 Mar

ais

 neighborhood

.


2

.

 Inv

ader




Inv

ader

 is

 a



Prompt: The future of AI is
Generated text: 

 now

.


Art

ificial

 intelligence

 is

 changing

 the

 world

.

 From

 virtual

 assistants

 to

 life

-changing

 medical

 advancements

,

 AI

 is

 re

def

ining

 the

 way

 we

 live

,

 work

 and

 interact

 with

 each

 other

.


AI

 has

 the

 potential

 to

 solve

 some

 of

 the

 world

’s

 most

 pressing

 challenges

,

 such

 as

 climate

 change

,

 poverty

 and

 access

 to

 education

.

 It

 can

 also

 help

 us

 make

 better

 decisions

,

 streamline

 processes

,

 and

 enhance

 productivity

.


As

 AI

 continues

 to

 evolve

,

 it

’s

 essential

 to

 stay

 up

-to

-date

 on

 the

 latest

 advancements

 and

 trends

.

 Here

 are

 some

 key

 areas

 to

 watch

 in

 the

 future

 of

 AI

:


1

.

 Edge

 AI

:

 With

 the

 increasing

 number

 of

 IoT

 devices

,

 edge

 AI

 is




In [6]:
llm.shutdown()