# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-30 01:29:59 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.25it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.13it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.11it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.52it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.35it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Kelly and I am a postdoctoral researcher in the Malsch lab here at the University of California, San Diego. My research focuses on understanding the molecular mechanisms of neurodegenerative diseases, particularly amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). I am excited to share my research experiences with you and hope that you will join me on this journey of discovery.
My research journey began with my undergraduate studies at the University of California, Berkeley, where I majored in molecular and cellular biology. During my undergraduate studies, I became fascinated with the complexities of neurodegenerative diseases and the potential
Prompt: The president of the United States is
Generated text: , of course, a vital part of the country's government. The head of state and head of government, the president serves as both the commander-in-chief of the armed forces and the leader of the executive branch of the federa

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Lana

 and

 I

'm

 the

 new

 kid

 in

 town

.

 I

'm

 a

 

23

-year

-old

 aspiring

 artist

 and

 I

'm

 eager

 to

 make

 a

 name

 for

 myself

 in

 the

 art

 world

.

 I

've

 recently

 moved

 to

 a

 small

 town

 in

 the

 countryside

,

 and

 I

'm

 excited

 to

 explore

 the

 area

 and

 get

 to

 know

 the

 locals

.


However

,

 I

'm

 a

 bit

 of

 an

 outsider

,

 and

 I

'm

 struggling

 to

 fit

 in

.

 Everyone

 seems

 to

 know

 each

 other

,

 and

 I

 feel

 like

 a

 stranger

 in

 a

 strange

 land

.

 I

've

 tried

 to

 strike

 up

 conversations

 with

 the

 locals

,

 but

 they

 seem

 hesitant

 to

 talk

 to

 me

.


I

'm

 starting

 to

 feel

 a

 bit

 discouraged

,

 but

 I



Prompt: The capital of France is
Generated text: 

 a

 city

 that

 has

 been

 in

 the

 news

 a

 lot

 lately

,

 and

 not

 always

 for

 the

 best

 reasons

.

 Paris

,

 the

 City

 of

 Light

,

 has

 been

 the

 subject

 of

 protests

,

 terrorism

,

 and

 economic

 struggles

 in

 recent

 years

.

 However

,

 Paris

 is

 also

 a

 city

 that

 is

 steep

ed

 in

 history

,

 art

,

 and

 culture

,

 and

 is

 home

 to

 some

 of

 the

 most

 famous

 landmarks

 in

 the

 world

,

 such

 as

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 and

 Notre

 Dame

 Cathedral

.

 In

 this

 article

,

 we

 will

 take

 a

 closer

 look

 at

 the

 city

 of

 Paris

 and

 explore

 its

 history

,

 culture

,

 and

 attractions

.


History

 of

 Paris




Paris

 has

 a

 rich

 and



Prompt: The future of AI is
Generated text: 

 here

,

 and

 it

’s

 going

 to

 revolution

ize

 the

 way

 we

 work

,

 live

,

 and

 interact

 with

 one

 another

.

 From

 virtual

 assistants

 to

 autonomous

 vehicles

,

 AI

 is

 already

 transforming

 numerous

 industries

 and

 aspects

 of

 our

 lives

.

 However

,

 as

 AI

 continues

 to

 advance

,

 we

 must

 also

 consider

 the

 potential

 risks

 and

 challenges

 associated

 with

 its

 development

 and

 deployment

.

 In

 this

 talk

,

 we

 will

 explore

 the

 current

 state

 of

 AI

,

 its

 potential

 applications

,

 and

 the

 key

 considerations

 for

 its

 responsible

 development

 and

 use

.


Art

ificial

 intelligence

 (

AI

)

 is

 a

 broad

 field

 of

 research

 and

 application

 that

 involves

 developing

 machines

 and

 systems

 that

 can

 perform

 tasks

 that

 would

 typically

 require

 human

 intelligence

.

 AI

 systems

 can

 learn




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Sharon!
I am a health and wellness coach, specializing in digestive health, food sensitivities, and stress management. I have a degree in Nutrition and a certification in Health Coaching from the Institute for Integrative Nutrition. I am also a certified Functional Diagnostic Nutritionist.
I'm passionate about empowering people to take control of their health and live their best lives. My approach is holistic and non-judgmental, focusing on creating a personalized plan that addresses the root causes of your health issues.
I'd love to connect with you and explore how we can work together to achieve your health and wellness goals. Let's chat!
What are your top

Prompt: The capital of France is
Generated text:  Paris, and the official language is French. French is a Romance language that originated in the Roman province of Gaul, which is now modern-day France and Belgium. French is spoken by over 274 million people worldwide, making it one of th

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Beat

rice

 and

 I

'm

 a

 sophomore

 at

 Hills

dale

 College

.

 I

'm

 a

 double

 major

 in

 Economics

 and

 Business

 Administration

.

 In

 my

 free

 time

,

 I

 enjoy

 playing

 the

 guitar

,

 hiking

,

 and

 volunteering

.


I

'm

 originally

 from

 a

 small

 town

 in

 Michigan

,

 and

 I

've

 always

 been

 drawn

 to

 the

 values

 of

 Hills

dale

 College

.

 The

 institution

's

 commitment

 to

 academic

 rigor

,

 intellectual

 freedom

,

 and

 character

 development

 align

s

 perfectly

 with

 my

 own

 values

 and

 goals

.


As

 a

 student

 at

 Hills

dale

,

 I

've

 had

 the

 opportunity

 to

 take

 a

 wide

 range

 of

 courses

,

 from

 Micro

e

conomics

 and

 Mac

roe

conomics

 to

 Accounting

 and

 Finance

.

 My

 favorite

 subjects

 are

 Economics

 and

 History

,



Prompt: The capital of France is
Generated text: 

 full

 of

 art

,

 history

,

 fashion

,

 and

 cuisine

.

 There

 are

 plenty

 of

 things

 to

 do

 in

 Paris

,

 and

 here

 are

 the

 top

 

10

 things

 to

 do

 in

 Paris

 that

 you

 won

’t

 want

 to

 miss

:


1

.

 Visit

 the

 E

iff

el

 Tower




The

 E

iff

el

 Tower

 is

 one

 of

 the

 most

 iconic

 landmarks

 in

 the

 world

,

 and

 it

’s

 a

 must

-

see

 when

 visiting

 Paris

.

 You

 can

 take

 the

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


2

.

 Explore

 the

 Lou

vre

 Museum




The

 Lou

vre

 Museum

 is

 home

 to

 some

 of

 the

 world

’s

 most

 famous

 artworks

,

 including

 the

 Mona

 Lisa

.

 The

 museum

 is

 located

 in

 a

 beautiful



Prompt: The future of AI is
Generated text: 

 bright




By

 Ben

oit

 Maurice

 |

 January

 

18

,

 

202

3




Art

ificial

 intelligence

 (

AI

)

 is

 rapidly

 evolving

 and

 transforming

 various

 industries

,

 from

 healthcare

 to

 finance

,

 transportation

 to

 education

.

 With

 its

 growing

 importance

,

 the

 future

 of

 AI

 looks

 bright

.


AI

 is

 no

 longer

 a

 buzz

word

;

 it

's

 a

 reality

 that

 is

 resh

aping

 the

 world

 we

 live

 in

.

 From

 predictive

 maintenance

 to

 personalized

 medicine

,

 AI

 is

 empowering

 businesses

 to

 make

 data

-driven

 decisions

,

 improve

 customer

 experiences

,

 and

 drive

 innovation

.


Some

 of

 the

 key

 areas

 where

 AI

 is

 expected

 to

 make

 a

 significant

 impact

 in

 the

 future

 include

:


Health

care

:

 AI

 is

 being

 used

 to

 develop

 personalized

 medicine

,




In [6]:
llm.shutdown()