# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 12-01 03:07:32 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.32it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.20it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.19it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.61it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.44it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Katie and I am a professional organizer and owner of Serenity & Space. I help busy families and individuals get their homes and lives organized and clutter-free. I specialize in home organization, time management, and productivity. I have a degree in Interior Design and a certification in Professional Organizing. I believe that a well-organized home and life can bring so much more joy, productivity, and peace to our daily lives. I would love to help you achieve your organizing goals!
What sparked your interest in professional organizing?
I have always been a very organized person, but it wasn’t until I had my first child that I realized how much more
Prompt: The president of the United States is
Generated text: , in fact, the head of state and the head of government of the United States. The president is both the chief executive of the federal government and the commander-in-chief of the armed forces. This means that the president is responsib

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 B

ao

 and

 I

 am

 a

 photographer

.

 I

 am

 a

 Brisbane

 based

 photographer

 specializing

 in

 family

,

 children

 and

 baby

 photography

.

 I

 am

 passionate

 about

 capturing

 the

 love

 and

 joy

 of

 family

 life

 and

 would

 love

 to

 capture

 your

 special

 moments

.


I

 offer

 a

 variety

 of

 photography

 services

 including

 in

-home

 photo

 shoots

,

 outdoor

 photo

 shoots

 and

 even

 studio

 photo

 shoots

.

 I

 also

 offer

 a

 variety

 of

 different

 packages

 to

 suit

 your

 budget

 and

 needs

.


My

 style

 is

 natural

 and

 relaxed

,

 I

 believe

 that

 this

 is

 the

 best

 way

 to

 capture

 the

 true

 personality

 and

 spirit

 of

 your

 family

.

 I

 love

 working

 with

 children

 and

 babies

 and

 I

 have

 a

 special

 talent

 for

 making

 them

 feel

 comfortable

 and

 at

 ease



Prompt: The capital of France is
Generated text: 

 located

 in

 the

 northern

 part

 of

 the

 country

,

 along

 the

 Se

ine

 River

.

 Paris

 is

 famous

 for

 its

 stunning

 architecture

,

 art

 museums

,

 fashion

 industry

,

 and

 romantic

 atmosphere

.

 Visitors

 can

 explore

 iconic

 landmarks

 such

 as

 the

 E

iff

el

 Tower

,

 Notre

 Dame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 The

 city

 is

 also

 known

 for

 its

 culinary

 delights

,

 including

 cro

iss

ants

,

 cheese

,

 and

 wine

.


Where

 is

 the

 capital

 of

 France

 located

?


The

 capital

 of

 France

 is

 located

 in

 the

 northern

 part

 of

 the

 country

.


What

 is

 the

 main

 attraction

 in

 Paris

?


The

 main

 attraction

 in

 Paris

 is

 the

 E

iff

el

 Tower

.


What

 is

 the

 name

 of

 the

 famous

 museum

 in

 Paris



Prompt: The future of AI is
Generated text: 

 bright

,

 and

 we

're

 already

 seeing

 it

 in

 action

 across

 various

 industries

.

 From

 healthcare

 and

 finance

 to

 education

 and

 transportation

,

 AI

 is

 being

 used

 to

 improve

 efficiency

,

 accuracy

,

 and

 decision

-making

.

 Here

 are

 some

 of

 the

 most

 significant

 advancements

 in

 AI

 and

 how

 they

're

 transforming

 the

 world

.


1

.

 Healthcare

:

 AI

-P

owered

 Diagnosis

 and

 Treatment




AI

 is

 revolution

izing

 the

 healthcare

 industry

 by

 helping

 doctors

 diagnose

 diseases

 more

 accurately

 and

 quickly

.

 AI

-powered

 algorithms

 can

 analyze

 medical

 images

,

 such

 as

 X

-rays

 and

 MR

Is

,

 to

 detect

 abnormalities

 and

 provide

 diagnoses

.

 Additionally

,

 AI

 can

 help

 personalize

 treatment

 plans

 based

 on

 individual

 patient

 data

.


2

.

 Finance

:

 AI

-

Driven

 Investment




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Andy! I'm a 25-year-old dude from the UK, living in the beautiful city of Prague, Czech Republic. I'm a huge fan of sports, music, and photography. When I'm not working, you can find me at the local gym, trying out new recipes in the kitchen, or exploring the city with my camera in hand.
What is your favorite place you have ever traveled to? And why?
So far, I've been fortunate enough to travel to a few amazing destinations, but if I had to pick just one, it would be New Zealand. The landscapes, the people, the food – everything about it

Prompt: The capital of France is
Generated text:  a city that's steeped in history and culture. From the iconic Eiffel Tower to the world-renowned Louvre Museum, Paris has something to offer every type of traveler. Here are some of the top things to do in Paris:
Visit the Eiffel Tower: The Eiffel Tower is one of the most iconic landmarks in the world and a must-see when visiting Paris. You can take the stair

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Sean

 and

 I

'm

 a

 representative

 of

 the

 Top

 Ten

 Review

 website

.

 Our

 website

 provides

 a

 comprehensive

 review

 of

 products

 and

 services

 available

 in

 the

 market

,

 including

 finance

,

 tech

,

 and

 online

 services

.

 Recently

,

 we

've

 been

 looking

 into

 various

 virtual

 private

 network

 (

VPN

)

 services

 and

 we

'd

 like

 to

 invite

 you

 to

 participate

 in

 our

 review

 process

.


We

're

 looking

 for

 a

 VPN

 service

 that

 offers

 a

 high

 level

 of

 security

,

 fast

 connection

 speeds

,

 and

 a

 user

-friendly

 interface

.

 We

've

 short

listed

 a

 few

 VPN

 services

 that

 we

 believe

 meet

 our

 requirements

,

 but

 we

'd

 like

 to

 hear

 from

 you

 and

 get

 your

 honest

 opinion

 on

 the

 services

 we

're

 considering

.


The

 VPN



Prompt: The capital of France is
Generated text: 

 a

 city

 that

 needs

 no

 introduction

,

 but

 the

 city

 of

 Paris

 is

 indeed

 one

 of

 the

 most

 famous

 cities

 in

 the

 world

.

 This

 beautiful

 city

 is

 known

 for

 its

 stunning

 architecture

,

 rich

 history

,

 and

 its

 incredible

 art

 collections

.

 It

 is

 a

 city

 that

 has

 been

 a

 major

 destination

 for

 tourists

 for

 centuries

,

 and

 it

's

 no

 surprise

 that

 it

's

 still

 one

 of

 the

 most

 visited

 cities

 in

 the

 world

 today

.


The

 E

iff

el

 Tower

,

 the

 Arc

 de

 Tri

omp

he

,

 and

 the

 Lou

vre

 Museum

 are

 just

 a

 few

 of

 the

 many

 famous

 landmarks

 that

 Paris

 has

 to

 offer

.

 The

 city

 also

 has

 a

 vibrant

 nightlife

,

 with

 a

 wide

 range

 of

 restaurants

,

 cafes

,



Prompt: The future of AI is
Generated text: 

 complex

 and

 intertwined

 with

 various

 emerging

 technologies

 such

 as

 blockchain

,

 the

 Internet

 of

 Things

 (

Io

T

),

 and

 the

 cloud

.

 Here

 are

 some

 future

 trends

 in

 AI

 that

 could

 be

 crucial

 in

 shaping

 the

 world

 of

 tomorrow

.


1

.

 Increased

 AI

 Adoption

 in

 the

 Healthcare

 Sector




Art

ificial

 intelligence

 is

 transforming

 healthcare

 by

 providing

 more

 accurate

 diagnoses

,

 improving

 patient

 care

,

 and

 stream

lining

 clinical

 workflows

.

 AI

-powered

 chat

bots

 can

 help

 patients

 with

 medical

 queries

,

 and

 predictive

 analytics

 can

 help

 identify

 high

-risk

 patients

.


2

.

 Rise

 of

 Autonomous

 Machines




Aut

onomous

 machines

,

 such

 as

 drones

,

 self

-driving

 cars

,

 and

 robots

,

 will

 become

 increasingly

 prevalent

 in

 various

 industries

 like

 transportation

,

 logistics

,




In [6]:
llm.shutdown()