# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 12-01 06:18:54 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.35it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.23it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.23it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.67it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.49it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Carla and I'm a passionate planner and designer. I've been working in the field of event planning for over 10 years, and I'm so excited to be a part of your special day. My goal is to create an unforgettable experience for you and your loved ones.
I specialize in creating unique and personalized wedding designs that reflect your style and vision. From intimate gatherings to grand celebrations, I'll work closely with you to bring your ideas to life.
I'm all about creating a seamless experience for my clients, from the initial consultation to the final farewell. My attention to detail, creativity, and expertise ensure that every aspect of your event
Prompt: The president of the United States is
Generated text:  the chief executive of the federal government, and the leader of the country. The president is responsible for enforcing the laws of the land and executing the duties of the government. The president is also the commander-in-chief of the 

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Stuart

 and

 I

 am

 a

 software

 developer

 with

 a

 passion

 for

 creating

 innovative

 and

 user

-friendly

 applications

.

 I

 have

 been

 working

 in

 the

 industry

 for

 over

 

10

 years

,

 and

 I

 have

 a

 proven

 track

 record

 of

 delivering

 high

-quality

 software

 solutions

 on

 time

 and

 on

 budget

.


I

 have

 experience

 working

 with

 a

 wide

 range

 of

 technologies

,

 including

 .

NET

,

 Java

,

 JavaScript

,

 and

 C

++,

 as

 well

 as

 various

 frameworks

 and

 libraries

 such

 as

 Spring

,

 Hibernate

,

 and

 Angular

.

 I

 am

 also

 proficient

 in

 cloud

-based

 technologies

 such

 as

 AWS

 and

 Azure

.


In

 addition

 to

 my

 technical

 skills

,

 I

 am

 a

 strong

 communicator

 and

 team

 player

,

 with

 excellent

 problem

-solving

 and

 analytical

 skills

.

 I

 have



Prompt: The capital of France is
Generated text: 

 Paris

,

 the

 official

 language

 is

 French

,

 and

 the

 currency

 is

 the

 Euro

.

 France

 is

 a

 member

 of

 the

 United

 Nations

,

 the

 European

 Union

,

 and

 NATO

.

 The

 country

 is

 divided

 into

 

26

 regions

,

 

13

 metropolitan

 and

 

13

 overseas

.

 The

 population

 of

 France

 is

 over

 

67

 million

 people

,

 and

 the

 capital

 city

,

 Paris

,

 has

 a

 population

 of

 over

 

2

.

1

 million

 people

.


France

 is

 a

 popular

 tourist

 destination

,

 known

 for

 its

 rich

 history

,

 art

,

 fashion

,

 and

 cuisine

.

 The

 country

 is

 home

 to

 many

 famous

 landmarks

,

 such

 as

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 and

 Notre

 Dame

 Cathedral

.

 France

 is

 also



Prompt: The future of AI is
Generated text: 

 here

 and

 it

's

 already

 shaping

 our

 lives

 in

 ways

 we

 never

 thought

 possible

.

 From

 personal

 assistants

 like

 Alexa

 and

 Google

 Home

 to

 self

-driving

 cars

 and

 AI

-powered

 healthcare

,

 the

 technology

 is

 advancing

 at

 an

 unprecedented

 pace

.

 But

 what

 does

 this

 mean

 for

 the

 future

 of

 work

?

 Will

 AI

 replace

 humans

 or

 augment

 our

 abilities

?

 Join

 us

 as

 we

 explore

 the

 exciting

 possibilities

 and

 challenges

 of

 AI

 in

 the

 workplace

.


Will

 AI

 Replace

 Human

 Workers

?


The

 question

 on

 everyone

's

 mind

:

 will

 AI

 replace

 human

 workers

?

 The

 answer

 is

 not

 a

 simple

 yes

 or

 no

.

 While

 AI

 is

 certainly

 going

 to

 automate

 some

 jobs

,

 it

 will

 also

 create

 new

 ones

.

 In

 fact

,

 a

 report




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Martin
My story is one of resilience and hope in the face of adversity. I have been living in Germany as an asylum seeker since 2016, fleeing conflict and persecution in my home country of Eritrea.
Initially, I found it challenging to adjust to a new environment, culture, and language. However, I was determined to rebuild my life and make a positive contribution to my new community.
Throughout my journey, I have been fortunate to have encountered numerous individuals and organizations who have offered support and guidance. One such organization is the German Refugee Council (DVV), which has provided me with language classes, job training, and counseling

Prompt: The capital of France is
Generated text:  home to some of the world’s most famous landmarks, including the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. But beneath the surface of the City of Light, there are many lesser-known gems waiting to be discovered.
One such gem i

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Jack

 and

 I

 am

 a

 former

 addict

 who

 has

 been

 sober

 for

 over

 

10

 years

.

 I

 have

 been

 fortunate

 enough

 to

 be

 able

 to

 help

 others

 on

 their

 journey

 of

 recovery

.

 I

 was

 a

 sponsor

 for

 several

 people

 in

 AA

 and

 have

 been

 working

 with

 others

 in

 recovery

 in

 various

 capacities

 ever

 since

.

 I

 am

 also

 a

 C

ADC

 (Cert

ified Alcohol

 and

 Drug Counsel

or

)

 and

 have

 a

 strong

 background

 in

 the

 field

.


I

 have

 been

 able

 to

 help

 many

 people

 get

 sober

 and

 stay

 sober

,

 and

 I

 take

 pride

 in

 knowing

 that

 I

 have

 made

 a

 difference

 in

 their

 lives

.

 I

 believe

 that

 everyone

 has

 the

 capacity

 to

 overcome

 addiction

 and

 achieve

 a

 life

 of

 happiness

 and



Prompt: The capital of France is
Generated text: 

 set

 to

 host

 the

 world

’s

 biggest

 innovation

 show

,

 V

ivate

ch

,

 from

 May

 

16

 to

 

18

,

 

202

3

.

 For

 the

 first

 time

,

 the

 event

 will

 take

 place

 at

 the

 Grande

 H

alle

 de

 la

 Vil

lette

,

 a

 popular

 Paris

ian

 venue

.

 With

 over

 

1

,

500

 startups

,

 

2

,

500

 investors

,

 and

 

120

,

000

 visitors

 expected

,

 V

ivate

ch

 

202

3

 promises

 to

 be

 a

 groundbreaking

 experience

.


V

ivate

ch

 

202

3

:

 A

 Celebration

 of

 Innovation

 and

 Technology




V

ivate

ch

,

 a

 global

 innovation

 event

,

 showcases

 cutting

-edge

 technologies

,

 startups

,

 and

 industries

 of

 the

 future

.

 The

 

202

3

 edition



Prompt: The future of AI is
Generated text: 

 bright

,

 and

 it

’s

 happening

 now




At

 this

 year

’s

 World

 Economic

 Forum

,

 I

 had

 the

 privilege

 to

 engage

 with

 the

 world

’s

 top

 business

 and

 technology

 leaders

 on

 the

 future

 of

 artificial

 intelligence

 (

AI

).

 The

 conversation

 was

 lively

,

 and

 the

 overall

 sentiment

 was

 optimistic

.


As

 AI

 continues

 to

 evolve

,

 its

 applications

 are

 becoming

 increasingly

 sophisticated

 and

 widespread

.

 We

’re

 witnessing

 a

 democrat

ization

 of

 AI

,

 where

 its

 benefits

 are

 no

 longer

 limited

 to

 tech

-s

av

vy

 entrepreneurs

 and

 corporations

.

 AI

 is

 now

 being

 applied

 in

 various

 sectors

,

 from

 healthcare

 to

 education

,

 transportation

 to

 finance

,

 and

 agriculture

 to

 energy

.


Here

 are

 a

 few

 examples

 of

 the

 many

 exciting

 advancements

:


1

.




In [6]:
llm.shutdown()