# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-26 01:42:33 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.36it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.24it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.24it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.69it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.50it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Sarang Khosla. I am a second-year student at the University of Illinois at Urbana-Champaign. I am pursuing a degree in Computer Science and a minor in Mathematics. I am excited to be a part of the 2023 cohort of the Illinois Data Science for Social Good (IDSSG) program.
I am originally from India and I moved to the United States in 2020 to pursue higher education. Prior to that, I had a passion for programming and mathematics, and I was a part of several hackathons and coding competitions in India.
In my free time, I enjoy reading books on science and philosophy
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States. The president serves a four-year term and is limited to two terms. The president is responsible for executing the laws of the land, as well as serving as commander-in-chief of the armed forces.
What is the main role of the President of the United States?
The Pre

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Ar

on

 and

 I

 am

 a

 software

 developer

.

 I

 live

 in

 Vancouver

,

 Canada

 and

 enjoy

 hiking

,

 photography

,

 and

 playing

 music

.


I

 have

 a

 strong

 passion

 for

 coding

 and

 am

 always

 looking

 for

 new

 challenges

 and

 projects

 to

 work

 on

.

 My

 expertise

 lies

 in

 backend

 development

 using

 Java

 and

 Node

.js

,

 but

 I

 am

 always

 eager

 to

 learn

 and

 expand

 my

 skill

set

.


In

 my

 free

 time

,

 I

 enjoy

 exploring

 the

 outdoors

 and

 capturing

 moments

 with

 my

 camera

.

 There

's

 something

 about

 being

 in

 nature

 that

 helps

 me

 clear

 my

 mind

 and

 spark

 new

 ideas

.


Music

 is

 also

 a

 big

 part

 of

 my

 life

.

 I

 play

 the

 guitar

 and

 enjoy

 writing

 my

 own

 songs

.

 It

's



Prompt: The capital of France is
Generated text: 

 a

 city

 that

's

 steep

ed

 in

 history

,

 art

 and

 romance

.

 It

's

 a

 city

 that

 will

 leave

 you

 enchanted

 and

 eager

 for

 more

.

 Paris

 is

 the

 City

 of

 Light

,

 a

 place

 where

 artists

 and

 writers

 have

 found

 inspiration

 for

 centuries

.

 From

 the

 E

iff

el

 Tower

 to

 the

 Lou

vre

,

 there

's

 no

 shortage

 of

 iconic

 landmarks

 to

 explore

.


Here

 are

 some

 of

 the

 top

 things

 to

 do

 in

 Paris

:


The

 E

iff

el

 Tower

:

 The

 most

 iconic

 landmark

 in

 Paris

,

 the

 E

iff

el

 Tower

 is

 a

 must

-

visit

 attraction

.

 You

 can

 take

 the

 stairs

 or

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


The

 Lou

vre

 Museum

:

 One

 of



Prompt: The future of AI is
Generated text: 

 not

 just

 about

 smarter

 machines

,

 but

 also

 about

 more

 human

 relationships




The

 future

 of

 AI

 is

 not

 just

 about

 smarter

 machines

,

 but

 also

 about

 more

 human

 relationships




In

 recent

 years

,

 AI

 has

 become

 increasingly

 sophisticated

,

 allowing

 us

 to

 interact

 with

 it

 more

 naturally

.

 However

,

 while

 AI

 has

 made

 tremendous

 progress

 in

 processing

 and

 analyzing

 vast

 amounts

 of

 data

,

 it

 still

 falls

 short

 in

 em

ulating

 the

 complexity

 and

 nuances

 of

 human

 relationships

.

 As

 AI

 continues

 to

 evolve

,

 it

 will

 be

 crucial

 to

 focus

 on

 developing

 more

 human

-like

 interactions

 and

 relationships

.


The

 need

 for

 more

 human

-like

 interactions




Currently

,

 AI

 is

 primarily

 designed

 to

 perform

 tasks

 with

 precision

 and

 speed

,

 but

 it

 often




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Angel and I am a freelance writer and editor. I have been writing for over 15 years, and I have a strong background in academic and technical writing. My specialties include research and data-driven writing, as well as creating engaging content for blogs, websites, and social media platforms.

Over the years, I have had the pleasure of working with a variety of clients across different industries, including education, healthcare, technology, and finance. I have a keen eye for detail, a strong understanding of grammar and punctuation, and a passion for crafting compelling stories that capture the reader's attention.

Some of my notable skills and experience include:

* Research

Prompt: The capital of France is
Generated text:  Paris. France is a popular tourist destination, with millions of visitors each year. The most popular attractions in France are the Eiffel Tower, the Louvre Museum, and Notre Dame Cathedral.
The Eiffel Tower is an iconi

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Judy

,

 and

 I

 am

 a

 gene

alog

ist

.

 I

 specialize

 in

 European

 ancestry

,

 particularly

 in

 tracing

 my

 clients

'

 families

 to

 their

 ancestral

 villages

 in

 Europe

.

 My

 knowledge

 includes

 medieval

 European

 history

,

 historical

 documents

,

 and

 gene

alog

ical

 research

 methods

.

 I

 have

 been

 a

 professional

 gene

alog

ist

 for

 over

 

15

 years

.


I

 have

 worked

 on

 numerous

 cases

 involving

 European

 ancestry

 and

 have

 a

 strong

 network

 of

 contacts

 and

 resources

 in

 Europe

.

 I

 can

 provide

 guidance

 on

 how

 to

 research

 your

 family

 tree

,

 translate

 documents

,

 and

 obtain

 vital

 records

 from

 European

 archives

.

 My

 expertise

 also

 includes

 DNA

 testing

 and

 how

 it

 can

 be

 used

 to

 supplement

 traditional

 research

.


I

 am

 a

 member

 of

 several



Prompt: The capital of France is
Generated text: 

 a

 vibrant

,

 cosm

opolitan

 city

 that

 has

 something

 to

 offer

 everyone

.

 Paris

 is

 a

 city

 of

 stunning

 beauty

,

 famous

 landmarks

,

 and

 romantic

 atmosphere

.

 The

 City

 of

 Light

 is

 famous

 for

 its

 art

 museums

,

 fashion

,

 cuisine

,

 and

 historical

 landmarks

,

 making

 it

 a

 must

-

visit

 destination

 for

 travelers

 from

 around

 the

 world

.


This

 guide

 will

 provide

 you

 with

 an

 overview

 of

 the

 best

 things

 to

 do

 in

 Paris

,

 from

 famous

 landmarks

 to

 hidden

 gems

,

 and

 will

 help

 you

 plan

 your

 trip

.


Paris

 is

 a

 large

 city

,

 and

 getting

 around

 can

 be

 challenging

.

 The

 city

 has

 an

 extensive

 public

 transportation

 system

,

 including

 the

 metro

,

 buses

,

 and

 trains

.

 T

axis

 and

 ride



Prompt: The future of AI is
Generated text: 

 being

 shaped

 by

 the

 interactions

 between

 humans

 and

 machines

.

 The

 intersection

 of

 human

 creativity

 and

 artificial

 intelligence

 (

AI

)

 is

 producing

 innovative

 and

 impactful

 results

 in

 various

 fields

,

 including

 art

,

 music

,

 and

 writing

.


AI

-generated

 content

 is

 becoming

 increasingly

 sophisticated

,

 allowing

 artists

 to

 collaborate

 with

 machines

 to

 create

 unique

 and

 imaginative

 works

.

 This

 collaboration

 can

 lead

 to

 new

 forms

 of

 expression

 and

 artistic

 styles

 that

 blend

 the

 capabilities

 of

 humans

 and

 machines

.


The

 integration

 of

 AI

 and

 human

 creativity

 also

 has

 significant

 implications

 for

 industries

 such

 as

 education

,

 healthcare

,

 and

 business

.

 AI

 can

 help

 analyze

 large

 datasets

,

 identify

 patterns

,

 and

 provide

 insights

 that

 humans

 may

 miss

,

 leading

 to

 more

 informed

 decision

-making

 and




In [6]:
llm.shutdown()