# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.05it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.03it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.50it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.42it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.33it/s]



  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:01<00:38,  1.74s/it]

  9%|▊         | 2/23 [00:02<00:20,  1.05it/s]

 13%|█▎        | 3/23 [00:02<00:12,  1.59it/s]

 17%|█▋        | 4/23 [00:02<00:09,  2.09it/s]

 22%|██▏       | 5/23 [00:02<00:07,  2.54it/s]

 26%|██▌       | 6/23 [00:03<00:06,  2.78it/s]

 30%|███       | 7/23 [00:03<00:05,  3.07it/s]

 35%|███▍      | 8/23 [00:03<00:04,  3.21it/s]

 39%|███▉      | 9/23 [00:03<00:04,  3.35it/s]

 43%|████▎     | 10/23 [00:04<00:03,  3.45it/s]

 48%|████▊     | 11/23 [00:04<00:03,  3.50it/s]

 52%|█████▏    | 12/23 [00:04<00:03,  3.53it/s]

 57%|█████▋    | 13/23 [00:05<00:02,  3.57it/s]

 61%|██████    | 14/23 [00:05<00:02,  3.57it/s]

 65%|██████▌   | 15/23 [00:05<00:02,  3.54it/s]

 70%|██████▉   | 16/23 [00:05<00:01,  3.58it/s]

 74%|███████▍  | 17/23 [00:06<00:01,  3.64it/s]

 78%|███████▊  | 18/23 [00:06<00:01,  3.67it/s]

 83%|████████▎ | 19/23 [00:06<00:01,  3.67it/s]

 87%|████████▋ | 20/23 [00:07<00:00,  3.59it/s]

 91%|█████████▏| 21/23 [00:07<00:00,  3.54it/s]

 96%|█████████▌| 22/23 [00:07<00:00,  3.42it/s]

100%|██████████| 23/23 [00:07<00:00,  3.37it/s]100%|██████████| 23/23 [00:07<00:00,  2.90it/s]


### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Nicole. I have a passion for mathematics, problem-solving, and understanding the world around me. I believe that mathematics is not just a subject, but a way of thinking and a tool to analyze and solve real-world problems. As a tutor, I strive to help my students develop a deep understanding of mathematical concepts and build their problem-solving skills. I am patient, encouraging, and enthusiastic, and I love seeing the light bulb go off in my students' heads when they grasp a new concept.

I specialize in tutoring mathematics for high school and college students, including algebra, geometry, trigonometry, calculus, statistics, and linear algebra.
Prompt: The president of the United States is
Generated text:  a position of great power and influence, but it is also a position that requires a great deal of hard work and dedication. The president must be a leader who can inspire and motivate the American people, make tough decisions, and work wi

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Alex

 and

 I

'm

 a

 software

 engineer

 with

 a

 passion

 for

 technology

 and

 innovation

.

 I

 have

 a

 strong

 background

 in

 software

 development

,

 with

 experience

 in

 programming

 languages

 such

 as

 Java

,

 C

++,

 Python

,

 and

 JavaScript

.

 I

'm

 also

 familiar

 with

 various

 frameworks

 and

 libraries

,

 including

 Spring

,

 React

,

 and

 Node

.js

.



I

'm

 a

 creative

 problem

 solver

 with

 a

 keen

 eye

 for

 detail

,

 and

 I

 enjoy

 working

 on

 complex

 projects

 that

 require

 critical

 thinking

 and

 analytical

 skills

.

 I

'm

 also

 a

 strong

 communicator

 and

 team

 player

,

 with

 experience

 in

 collaborating

 with

 cross

-functional

 teams

 to

 deliver

 high

-quality

 software

 products

.



In

 my

 free

 time

,

 I

 enjoy

 learning

 new

 programming

 languages

 and

 technologies

,

 as

 well



Prompt: The capital of France is
Generated text: 

 a

 beautiful

 and

 romantic

 city

 that

 is

 steep

ed

 in

 history

 and

 culture

.

 Paris

 is

 known

 as

 the

 City

 of

 Light

,

 and

 it

's

 a

 must

-

visit

 destination

 for

 anyone

 who

 loves

 art

,

 architecture

,

 fashion

,

 and

 food

.


Here

 are

 some

 of

 the

 top

 things

 to

 do

 in

 Paris

:


Visit

 the

 E

iff

el

 Tower

:

 This

 iconic

 landmark

 is

 a

 must

-

see

 when

 visiting

 Paris

.

 You

 can

 take

 the

 stairs

 or

 elevator

 to

 the

 top

 for

 stunning

 views

 of

 the

 city

.


Explore

 the

 Lou

vre

 Museum

:

 The

 Lou

vre

 is

 one

 of

 the

 world

's

 largest

 and

 most

 famous

 museums

,

 housing

 an

 impressive

 collection

 of

 art

 and

 artifacts

 from

 around

 the

 world

,

 including

 the



Prompt: The future of AI is
Generated text: 

 exciting

,

 but

 also

 poses

 challenges

 for

 workers

.

 Here

 are

 some

 key

 considerations

 and

 potential

 solutions

.


The

 increasing

 use

 of

 artificial

 intelligence

 (

AI

)

 in

 the

 workplace

 has

 sparked

 both

 excitement

 and

 concern

.

 While

 AI

 has

 the

 potential

 to

 greatly

 improve

 productivity

 and

 efficiency

,

 it

 also

 poses

 significant

 challenges

 for

 workers

,

 including

 job

 displacement

 and

 changes

 to

 the

 nature

 of

 work

.


Here

 are

 some

 key

 considerations

 and

 potential

 solutions

 for

 the

 future

 of

 AI

 in

 the

 workplace

:


1

.

 Job

 displacement

:

 As

 AI

 takes

 over

 routine

 and

 repetitive

 tasks

,

 some

 jobs

 may

 become

 obsolete

.

 This

 can

 lead

 to

 significant

 job

 displacement

,

 especially

 for

 low

-sk

illed

 and

 low

-wage

 workers

.


2

.

 Changes

 to

 work




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Jen and I am a freelance writer and editor. I specialize in writing high-quality, well-researched content for businesses and organizations. My expertise includes writing for websites, social media, blogs, and print materials such as brochures, flyers, and newsletters. I can also help with content strategy and editing.
I have a strong background in research and writing, with a degree in English and a passion for learning. I am a skilled writer and editor, and I am confident that I can help you achieve your content goals.
Some of the types of content I can create include:
Website content (homepages, about pages, product descriptions

Prompt: The capital of France is
Generated text:  Paris. The famous Eiffel Tower, one of the world's most iconic landmarks, is located in Paris. Paris is known as the "City of Light" and is famous for its beautiful gardens, museums, and art galleries. The city has a population of around 2.1 million people and is ho

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Sarah

 and

 I

 am

 a

 senior

 at

 Xavier

 University

.

 I

 am

 major

ing

 in

 Psychology

 with

 a

 minor

 in

 Education

.

 I

 am

 from

 a

 small

 town

 in

 Kentucky

 called

 Somerset

.

 I

 have

 always

 been

 passionate

 about

 working

 with

 children

 and

 promoting

 education

 in

 my

 community

.

 This

 summer

 I

 was

 lucky

 enough

 to

 intern

 with

 the

 National

 Honor

 Society

 at

 Xavier

 University

.

 I

 worked

 closely

 with

 the

 advisor

,

 Dr

.

 W

icker

,

 to

 coordinate

 events

 and

 activities

 for

 the

 students

.

 This

 experience

 not

 only

 gave

 me

 the

 opportunity

 to

 develop

 my

 leadership

 skills

 but

 also

 allowed

 me

 to

 connect

 with

 students

 and

 create

 lasting

 memories

.

 This

 fall

 I

 am

 looking

 forward

 to

 volunteering

 at

 the

 local

 elementary

 school

,

 where



Prompt: The capital of France is
Generated text: 

 a

 city

 that

 needs

 no

 introduction

.

 It

 is

 the

 epic

enter

 of

 fashion

,

 art

,

 and

 cuisine

 that

 defines

 the

 very

 essence

 of

 the

 French

 culture

.

 But

 Paris

 is

 not

 just

 a

 tourist

 destination

;

 it

 is

 also

 a

 city

 of

 contrasts

 –

 it

 is

 a

 place

 where

 elegance

 and

 gr

ime

,

 tradition

 and

 innovation

,

 blend

 together

 in

 a

 beautiful

 harmony

.


Here

 are

 some

 of

 the

 things

 you

 can

 do

 and

 experience

 when

 you

 visit

 Paris

:


Explore

 the

 City

’s

 Icon

ic

 Land

marks




Paris

 is

 home

 to

 some

 of

 the

 world

’s

 most

 famous

 landmarks

,

 including

 the

 E

iff

el

 Tower

,

 Notre

-D

ame

 Cathedral

,

 the

 Lou

vre

 Museum

,

 and

 the

 Arc

 de

 Tri

omp



Prompt: The future of AI is
Generated text: 

 looking

 bright

,

 but

 what

 does

 it

 mean

 for

 the

 world

 of

 work

?


The

 current

 pace

 of

 technological

 change

 is

 unprecedented

.

 The

 World

 Economic

 Forum

 estimates

 that

 by

 

202

2

,

 more

 than

 a

 third

 of

 the

 desired

 skills

 for

 most

 jobs

 will

 be

 comprised

 of

 skills

 that

 are

 not

 yet

 considered

 crucial

 to

 the

 job

 today

.

 This

 suggests

 that

 workers

 will

 need

 to

 continually

 up

skill

 and

 res

kill

 to

 stay

 relevant

 in

 the

 job

 market

.


AI

 is

 transforming

 industries

 in

 a

 way

 that

 has

 never

 been

 seen

 before

.

 The

 automation

 of

 jobs

 is

 real

,

 but

 the

 good

 news

 is

 that

 AI

 is

 also

 creating

 new

 jobs

 and

 industries

 that

 we

 can

’t

 even

 imagine

 yet

.

 The

 future

 of




In [6]:
llm.shutdown()