# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.16it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.24it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.82it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.46it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.44it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Lyle, I have been working on my home security system and I have made a few mistakes. I have taken the hardwired system and converted it to a wireless one. However, I have some issues that I am hoping you can help me resolve.
First issue:
I have a wired keypad at the door, and I have connected it to a wireless keypad that I have placed inside the house. I was hoping to use the wireless keypad to control the system, but I have found that it is not working properly. The wireless keypad is not reporting the codes correctly to the wireless receiver that is connected to the hardwired system.
Prompt: The president of the United States is
Generated text:  not above the law. That's the key takeaway from a scathing report by the House Oversight Committee, which investigated allegations of President Trump's misconduct related to Ukraine.
The report, which was released on Tuesday, found that Trump had, in fact, engaged in behavior that constitutes an impe

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Anthony

,

 and

 I

 am

 the

 CEO

 of

 a

 large

 corporation

.

 I

 am

 also

 a

 total

 intro

vert

.

 As

 an

 intro

vert

,

 I

 have

 always

 found

 it

 challenging

 to

 navigate

 the

 complexities

 of

 corporate

 life

.

 However

,

 I

 have

 learned

 a

 few

 strategies

 that

 have

 helped

 me

 to

 excel

 in

 my

 role

 and

 maintain

 my

 sanity

 in

 the

 process

.



First

 and

 foremost

,

 I

 prioritize

 my

 time

.

 I

 make

 sure

 to

 schedule

 my

 day

 in

 a

 way

 that

 allows

 me

 to

 focus

 on

 the

 most

 important

 tasks

 without

 feeling

 overwhelmed

.

 This

 means

 setting

 realistic

 goals

 and

 boundaries

 for

 myself

,

 and

 learning

 to

 say

 no

 to

 non

-

essential

 commitments

.



Another

 strategy

 I

 use

 is

 to

 carve

 out

 dedicated

 time



Prompt: The capital of France is
Generated text: 

 a

 city

 like

 no

 other

.

 It

's

 a

 city

 that

 has

 something

 to

 offer

 for

 everyone

,

 from

 art

 lovers

 and

 history

 buffs

 to

 food

ies

 and

 fashion

istas

.

 Whether

 you

're

 looking

 to

 explore

 the

 city

's

 famous

 landmarks

,

 indulge

 in

 its

 culinary

 delights

,

 or

 simply

 soak

 up

 the

 vibrant

 atmosphere

,

 Paris

 is

 a

 must

-

visit

 destination

.


One

 of

 the

 most

 iconic

 landmarks

 in

 Paris

 is

 the

 E

iff

el

 Tower

,

 a

 stunning

 iron

 lattice

 structure

 that

 was

 built

 for

 the

 

188

9

 World

's

 Fair

.

 Visitors

 can

 take

 a

 lift

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.

 Another

 must

-

visit

 attraction

 is

 the

 Lou

vre

 Museum

,

 which

 houses

 an

 impressive

 collection



Prompt: The future of AI is
Generated text: 

 here

 and

 it

's

 coming

 from

 the

 natural

 world




We

 often

 think

 of

 artificial

 intelligence

 (

AI

)

 as

 something

 created

 by

 humans

,

 using

 complex

 algorithms

 and

 programming

 languages

.

 But

 what

 if

 we

 were

 to

 look

 to

 nature

 for

 inspiration

 for

 the

 next

 generation

 of

 AI

?

 Recent

 discoveries

 in

 fields

 like

 biology

 and

 ecology

 are

 providing

 a

 wealth

 of

 insights

 into

 the

 potential

 for

 natural

 systems

 to

 inform

 the

 development

 of

 AI

.


In

 this

 episode

 of

 The

 Conversation

,

 we

 explore

 how

 nature

 can

 inform

 AI

,

 with

 a

 focus

 on

 the

 work

 of

 Dr

.

 Kath

arine

 Hay

nes

,

 an

 ec

ologist

 who

 is

 using

 the

 principles

 of

 animal

 navigation

 to

 inform

 the

 development

 of

 new

 AI

 systems

.


Dr

.




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Joanna and I'm a artist. I love painting and drawing. I try to do as much as I can but I'm not a full-time artist. I have a 9-to-5 job and a family to take care of. But that doesn't stop me from pursuing my passion for art. I try to spend at least a few hours on the weekends drawing and painting. I'm not great at it yet, but I'm trying to improve every day. I love sharing my art with people and getting their feedback. It's great to know that someone out there sees the value in my work.
I'm not sure what kind

Prompt: The capital of France is
Generated text:  Paris, and the country is known for its rich history, art, fashion, and cuisine. France is home to numerous famous landmarks such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral.
What is the capital of France?
The capital of France is Paris, known for its rich history, art, fashion, and cuisine. Paris is famous for iconic landmarks like the Eiffel Tower, the Louvre Museum

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 And

ree

a

 and

 I

'm

 a

 student

 in

 the

 M

Sc

 in

 Data

 Science

 at

 the

 University

 of

 Oxford

.

 I

'm

 thrilled

 to

 be

 part

 of

 the

 European

 Data

 Science

 community

 and

 I

'm

 looking

 forward

 to

 engaging

 with

 fellow

 data

 enthusiasts

 and

 professionals

.


My

 background

 is

 in

 Mathematics

 and

 Computer

 Science

,

 and

 I

 have

 a

 strong

 interest

 in

 machine

 learning

 and

 AI

.

 Throughout

 my

 studies

,

 I

 have

 worked

 on

 various

 projects

 that

 involved

 data

 analysis

,

 visualization

,

 and

 modeling

,

 using

 tools

 such

 as

 Python

,

 R

,

 and

 SQL

.


As

 a

 data

 scientist

,

 I

 believe

 that

 data

 should

 be

 used

 to

 drive

 informed

 decisions

 and

 create

 positive

 impact

.

 I

'm

 passionate

 about

 exploring

 new

 ways

 to



Prompt: The capital of France is
Generated text: 

 also

 a

 top

 destination

 for

 tourists

 due

 to

 its

 history

,

 art

,

 fashion

,

 food

,

 and

 romance

.

 Visit

 the

 iconic

 E

iff

el

 Tower

,

 Notre

 Dame

 Cathedral

,

 Lou

vre

 Museum

,

 and

 Ch

amps

-

É

lys

ées

.

 Enjoy

 a

 Se

ine

 River

 cruise

,

 sample

 French

 cuisine

,

 and

 soak

 up

 the

 city

's

 charming

 atmosphere

.


Travel

 to

 the

 City

 of

 Light

 and

 explore

 its

 famous

 art

,

 architecture

,

 fashion

,

 and

 cuisine

.

 From

 the

 E

iff

el

 Tower

 to

 the

 Lou

vre

 Museum

,

 discover

 the

 history

 and

 culture

 of

 Paris

.

 Visit

 famous

 landmarks

 like

 Notre

 Dame

 Cathedral

,

 Arc

 de

 Tri

omp

he

,

 and

 Mont

mart

re

.

 Enjoy

 a

 Se

ine

 River

 cruise



Prompt: The future of AI is
Generated text: 

 more

 about

 collaboration

 than

 competition




In

 recent

 years

,

 there

 has

 been

 a

 growing

 narrative

 around

 the

 AI

 industry

 that

 emphasizes

 competition

 and

 one

-up

manship

.

 We

've

 seen

 the

 rise

 of

 AI

 "

w

ars

"

 between

 tech

 giants

,

 with

 each

 company

 trying

 to

 out

do

 the

 others

 in

 AI

 capabilities

 and

 deployment

.

 This

 narrative

 has

 led

 to

 a

 focus

 on

 proprietary

 and

 secretive

 approaches

 to

 AI

 development

,

 with

 a

 emphasis

 on

 intellectual

 property

 and

 competitive

 advantage

.


However

,

 this

 approach

 is

 starting

 to

 show

 its

 limitations

.

 As

 AI

 becomes

 more

 ubiquitous

 and

 integrated

 into

 various

 industries

 and

 sectors

,

 the

 need

 for

 collaboration

 and

 cooperation

 is

 becoming

 increasingly

 clear

.

 Here

 are

 a

 few

 reasons

 why

 the

 future

 of




In [6]:
llm.shutdown()