# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  4.84it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.77it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:01<00:00,  1.34it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.20it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.36it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Marie and I’m a freelance writer based in the beautiful countryside of rural France. I specialise in writing travel articles, blog posts, and copy for the travel and tourism industry. I’m also passionate about creating engaging and informative content for websites, social media, and publications that inspire people to explore new destinations and experiences.

With over 10 years of experience in writing and travel, I have developed a unique voice and style that brings destinations to life. My writing is descriptive, engaging, and informative, making it perfect for travel enthusiasts, tour operators, and travel companies looking to showcase their products and services.

My areas of expertise include:

*
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States. The president leads the executive branch of the federal government and is the commander-in-chief of the United States A

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Maria

 and

 I

'm

 a

 freelance

 makeup

 artist

 with

 

10

+

 years

 of

 experience

 in

 the

 beauty

 industry

.

 I

 specialize

 in

 Brid

al

 Makeup

,

 Special

 Occ

asion

 Makeup

,

 and

 Beauty

 Coaching

.

 I

 am

 passionate

 about

 helping

 my

 clients

 look

 and

 feel

 their

 best

 on

 their

 special

 day

.

 I

 have

 a

 keen

 eye

 for

 detail

 and

 a

 love

 for

 experimenting

 with

 different

 makeup

 looks

 and

 techniques

.

 I

 am

 also

 a

 skilled

 educator

 and

 have

 taught

 makeup

 classes

 and

 workshops

 for

 various

 companies

 and

 individuals

.

 When

 I

'm

 not

 working

,

 you

 can

 find

 me

 trying

 out

 new

 makeup

 products

 or

 spending

 time

 with

 my

 family

.


I

 am

 available

 to

 travel

 for

 weddings

 and

 special

 occasions

.

 I

 have

 worked

 with



Prompt: The capital of France is
Generated text: 

 a

 city

 of

 great

 beauty

 and

 history

,

 with

 a

 wide

 range

 of

 attractions

 and

 activities

 to

 suit

 all

 interests

.

 From

 the

 iconic

 E

iff

el

 Tower

 to

 the

 world

-class

 museums

 and

 art

 galleries

,

 there

's

 something

 for

 everyone

 in

 Paris

.

 Take

 a

 romantic

 river

 cruise

 along

 the

 Se

ine

,

 visit

 the

 famous

 Notre

 Dame

 Cathedral

,

 or

 explore

 the

 trendy

 neighborhoods

 of

 Mont

mart

re

 and

 Le

 Mar

ais

.

 You

 can

 also

 indulge

 in

 the

 city

's

 renowned

 cuisine

 and

 wine

,

 from

 cro

iss

ants

 and

 bag

uet

tes

 to

 fine

 dining

 and

 Mich

elin

-star

red

 restaurants

.


Where

 to

 Go

 in

 Paris




The

 E

iff

el

 Tower

,

 the

 iconic

 symbol

 of

 Paris

 and

 one

 of

 the



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 its

 impact

 on

 jobs

 is

 uncertain




Art

ificial

 intelligence

 (

AI

)

 is

 changing

 the

 world

 in

 profound

 ways

.

 From

 virtual

 assistants

 like

 Siri

 and

 Alexa

 to

 self

-driving

 cars

 and

 personalized

 medicine

,

 AI

 is

 transforming

 industries

 and

 revolution

izing

 the

 way

 we

 live

 and

 work

.

 However

,

 as

 AI

 becomes

 increasingly

 sophisticated

,

 it

 also

 raises

 concerns

 about

 its

 impact

 on

 jobs

 and

 the

 economy

.

 Will

 AI

 create

 new

 opportunities

,

 or

 will

 it

 dis

place

 human

 workers

?


The

 debate

 about

 the

 impact

 of

 AI

 on

 jobs

 is

 complex

 and

 multif

ac

eted

.

 On

 one

 hand

,

 AI

 has

 the

 potential

 to

 augment

 human

 capabilities

,

 freeing

 us

 from

 mundane

 and

 repetitive

 tasks

 and

 enabling

 us

 to




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Daniel. I am a mechanical engineer. My main area of expertise is vibration analysis and structural dynamics. I have worked in the aerospace industry for many years and have written several papers on the subject of vibration testing and analysis. I have also taught classes on vibration analysis and structural dynamics at a university. I am interested in speaking with someone about a potential collaboration on a project involving vibration analysis of a complex structure. Can you tell me a little bit about your background and how you think you might be able to contribute to this project? Thank you.
Daniel,
It sounds like you have a strong background in vibration analysis and structural dynamics, particularly in the

Prompt: The capital of France is
Generated text:  Paris, a city of romance, fashion and fine cuisine. Paris is one of the world's top tourist destinations, attracting over 23 million visitors each year. It is known as the City of Li

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Rachel

.

 I

 am

 a

 jewelry

 designer

 and

 artist

,

 based

 in

 Berlin

,

 Germany

.

 My

 work

 combines

 traditional

 craftsmanship

 with

 modern

 techniques

 and

 a

 strong

 sense

 of

 individual

ity

.


I

 focus

 on

 creating

 unique

,

 handmade

 pieces

 that

 tell

 a

 story

,

 reflect

 a

 moment

,

 or

 simply

 bring

 joy

 to

 the

 wearer

.

 My

 style

 is

 eclectic

 and

 ever

-ev

olving

,

 drawing

 inspiration

 from

 various

 cultures

,

 historical

 periods

,

 and

 personal

 experiences

.


Each

 piece

 is

 carefully

 hand

crafted

 in

 my

 studio

,

 where

 I

 experiment

 with

 different

 materials

,

 colors

,

 and

 textures

 to

 bring

 my

 visions

 to

 life

.

 I

 work

 with

 a

 range

 of

 materials

,

 including

 precious

 metals

,

 gems

,

 and

 alternative

 materials

 like

 wood

,

 stone



Prompt: The capital of France is
Generated text: 

 a

 city

 of

 grand

eur

,

 with

 some

 of

 the

 most

 iconic

 landmarks

 in

 the

 world

.

 From

 the

 E

iff

el

 Tower

 to

 the

 Lou

vre

 Museum

,

 Paris

 is

 a

 must

-

visit

 destination

 for

 anyone

 interested

 in

 history

,

 art

,

 architecture

,

 and

 culture

.


In

 this

 article

,

 we

'll

 explore

 the

 top

 

10

 things

 to

 do

 in

 Paris

,

 covering

 a

 range

 of

 activities

 and

 attractions

 that

 will

 help

 you

 make

 the

 most

 of

 your

 trip

.


1

.

 Visit

 the

 E

iff

el

 Tower




The

 E

iff

el

 Tower

 is

 one

 of

 the

 most

 recognizable

 landmarks

 in

 the

 world

,

 and

 a

 visit

 to

 Paris

 wouldn

't

 be

 complete

 without

 seeing

 it

 up

 close

.

 Take

 the

 elevator

 to



Prompt: The future of AI is
Generated text: 

 bright

 and

 uncertain

.

 Here

's

 what

 to

 expect

.


From

 assisting

 with

 everyday

 tasks

 to

 making

 life

-or

-death

 decisions

,

 artificial

 intelligence

 (

AI

)

 is

 increasingly

 becoming

 a

 part

 of

 our

 lives

.

 As

 AI

 technology

 advances

,

 it

 will

 likely

 have

 a

 profound

 impact

 on

 various

 aspects

 of

 society

,

 from

 the

 workplace

 to

 healthcare

 and

 beyond

.

 Here

 are

 some

 potential

 developments

 to

 anticipate

 in

 the

 future

 of

 AI

:


1

.

 Increased

 Integration

 into

 Daily

 Life




AI

 will

 continue

 to

 se

ep

 into

 every

 aspect

 of

 our

 daily

 lives

,

 from

 personal

 assistants

 like

 Alexa

 and

 Google

 Assistant

 to

 smart

 home

 devices

 and

 self

-driving

 cars

.

 We

 can

 expect

 AI

 to

 become

 even

 more

 intuitive

 and

 user

-friendly

,

 making




In [6]:
llm.shutdown()