# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.31it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.20it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.16it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.56it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.40it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Noel and I am a Cyber Security Specialist, I have been working in the field for over 10 years. I have a strong passion for Cyber Security and I am always looking for new ways to improve my skills and knowledge.
In my previous role, I was responsible for protecting an enterprise network from cyber threats. I used a combination of technical and non-technical skills to identify and mitigate potential security risks.
I am well-versed in a variety of security technologies, including firewalls, intrusion detection systems, encryption, and threat intelligence. I am also skilled in incident response and disaster recovery.
I am a strong advocate for security awareness and education
Prompt: The president of the United States is
Generated text:  taking the first steps to honor the life and legacy of the late Supreme Court Justice Ruth Bader Ginsburg, who passed away on Friday. President Trump has ordered the U.S. flag to be flown at half-staff at the Whi

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Annie

 and

 I

 am

 thrilled

 to

 be

 joining

 the

 marketing

 team

 at

 Power

Objects

!

 I

 am

 a

 fresh

 face

 in

 the

 industry

 and

 I

 am

 excited

 to

 bring

 my

 passion

 for

 marketing

 and

 technology

 to

 the

 table

.


As

 a

 marketing

 professional

,

 I

 have

 a

 strong

 background

 in

 branding

,

 social

 media

,

 and

 content

 creation

.

 I

 have

 a

 knack

 for

 understanding

 what

 makes

 a

 brand

 unique

 and

 crafting

 compelling

 stories

 that

 engage

 and

 resonate

 with

 audiences

.

 My

 creative

 vision

 and

 attention

 to

 detail

 allow

 me

 to

 develop

 innovative

 marketing

 campaigns

 that

 drive

 results

.


As

 a

 team

 player

,

 I

 thrive

 in

 collaborative

 environments

 where

 ideas

 are

 shared

 and

 success

 is

 celebrated

.

 I

 believe

 that

 building

 strong

 relationships

 and

 fostering

 open

 communication



Prompt: The capital of France is
Generated text: 

 Paris

,

 which

 is

 also

 the

 country

's

 largest

 city

.

 Paris

 is

 a

 popular

 tourist

 destination

,

 known

 for

 its

 romantic

 atmosphere

,

 beautiful

 architecture

,

 art

 museums

,

 fashion

,

 cuisine

,

 and

 historical

 landmarks

.

 Many

 famous

 works

 of

 literature

,

 art

,

 and

 fashion

 have

 been

 created

 in

 Paris

,

 and

 the

 city

 remains

 a

 major

 cultural

 and

 intellectual

 hub

.


Paris

 has

 many

 famous

 landmarks

,

 including

 the

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 Notre

-D

ame

 Cathedral

,

 and

 the

 Arc

 de

 Tri

omp

he

.

 The

 city

 is

 also

 home

 to

 many

 fashion

 designers

 and

 fashion

 houses

,

 and

 is

 considered

 the

 global

 center

 for

 fashion

.

 Paris

 has

 a

 rich

 history

,

 with

 many

 buildings

 and



Prompt: The future of AI is
Generated text: 

 here

,

 and

 it

's

 already

 changing

 the

 way

 we

 work

,

 live

,

 and

 interact

 with

 each

 other

.

 Artificial

 intelligence

 (

AI

)

 is

 transforming

 various

 industries

,

 from

 healthcare

 and

 finance

 to

 transportation

 and

 education

.

 In

 this

 article

,

 we

'll

 explore

 the

 exciting

 developments

 in

 AI

 and

 how

 it

's

 shaping

 the

 future

 of

 work

,

 society

,

 and

 technology

.


The

 Rise

 of

 AI

-P

owered

 Tools

 and

 Applications




AI

-powered

 tools

 and

 applications

 are

 becoming

 increasingly

 popular

,

 making

 it

 easier

 for

 people

 to

 access

 and

 utilize

 AI

 capabilities

.

 Some

 examples

 of

 AI

-powered

 tools

 include

:


Chat

bots

 and

 virtual

 assistants

,

 like

 Siri

,

 Alexa

,

 and

 Google

 Assistant

,

 which

 can

 perform

 tasks

 such

 as

 setting




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  XylophoneRhapsody, and I am a non-binary lesbian who is also a member of the LGBTQ+ community. I have been a member of the Discord server "LGBTQ+ Forum" for a few months now, and I must say that it has been a truly eye-opening experience. The community is so supportive and understanding, and I have made some amazing friends who accept me for who I am.

However, I have noticed that there are some individuals on the server who seem to be extremely intolerant and discriminatory towards certain groups within the community. It's really frustrating to see people being hurt and marginalized, and I

Prompt: The capital of France is
Generated text:  known for its stunning architecture, art, and history. Paris, the City of Light, is famous for its iconic landmarks like the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. Visit the famous Champs-Élysées, a beautiful avenue lined with cafes, restaurants, and shops. Take a river Seine cruise to 

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Emily

.


I

 am

 a

 makeup

 artist

 and

 hair

 stylist

,

 and

 I

 have

 been

 in

 the

 industry

 for

 over

 

10

 years

.


I

 am

 based

 in

 Sydney

 and

 I

 have

 worked

 with

 a

 variety

 of

 clients

,

 from

 high

 end

 fashion

 magazines

 to

 celebrities

 and

 TV

 personalities

.


My

 experience

 includes

 bridal

 makeup

 and

 hair

,

 editorial

 shoots

,

 commercial

 work

 and

 personal

 beauty

 consultations

.


I

 take

 pride

 in

 providing

 excellent

 service

 and

 ensuring

 that

 each

 client

 leaves

 feeling

 confident

 and

 beautiful

.


I

 am

 a

 perfection

ist

 and

 I

 am

 passionate

 about

 my

 work

,

 and

 I

 am

 always

 looking

 for

 new

 and

 exciting

 ways

 to

 push

 the

 boundaries

 of

 beauty

.


I

 would

 be

 delighted

 to

 work

 with

 you

 on

 your

 next

 project



Prompt: The capital of France is
Generated text: 

 Paris

,

 and

 the

 largest

 city

 is

 Paris

 as

 well

.

 The

 country

 is

 divided

 into

 

13

 regions

 and

 

96

 departments

,

 with

 a

 total

 of

 

36

 metropolitan

 areas

.

 France

 has

 a

 population

 of

 

66

.

3

 million

 people

 and

 a

 GDP

 of

 $

2

.

5

 trillion

.


France

 is

 known

 for

 its

 rich

 history

 and

 culture

,

 and

 is

 home

 to

 many

 famous

 landmarks

 such

 as

 the

 E

iff

el

 Tower

,

 Notre

 Dame

 Cathedral

,

 and

 the

 Lou

vre

 Museum

.

 The

 country

 is

 also

 famous

 for

 its

 cuisine

,

 fashion

,

 and

 wine

,

 and

 is

 a

 popular

 destination

 for

 tourists

.


France

 is

 a

 country

 located

 in

 Western

 Europe

,

 bordered

 by

 several

 countries

 including

 Belgium

,

 Germany



Prompt: The future of AI is
Generated text: 

 bright

,

 but

 it

 requires

 human

 collaboration




AI

 has

 the

 potential

 to

 bring

 about

 tremendous

 advancements

 in

 various

 industries

,

 but

 its

 future

 also

 depends

 on

 human

 collaboration

 and

 consideration

 of

 the

 potential

 consequences

.


The

 rapid

 development

 and

 deployment

 of

 artificial

 intelligence

 (

AI

)

 has

 led

 to

 significant

 advancements

 in

 various

 industries

,

 from

 healthcare

 and

 finance

 to

 education

 and

 transportation

.

 However

,

 as

 AI

 becomes

 increasingly

 integrated

 into

 our

 lives

,

 concerns

 about

 its

 impact

 on

 society

 have

 grown

.

 While

 AI

 has

 the

 potential

 to

 bring

 about

 tremendous

 benefits

,

 its

 future

 also

 depends

 on

 human

 collaboration

 and

 consideration

 of

 the

 potential

 consequences

.


Coll

abor

ation

 and

 oversight

 are

 essential

 for

 responsible

 AI

 development




As

 AI

 systems

 become

 more

 sophisticated




In [6]:
llm.shutdown()