# Ray Serve

[Ray Serve](https://docs.ray.io/en/latest/serve/index.html) 是一个可扩展的模型服务库，用于构建在线推理 API。Serve 特别适合系统组合，使您能够用 Python 代码构建由多个链接和业务逻辑组成的复杂推理服务。

## 本 Notebook 的目标
本 Notebook 展示了一个将 OpenAI chain 部署到生产环境的简单示例。您可以将其扩展，部署自己的自托管模型，轻松定义在生产环境中高效运行模型所需的硬件资源量（GPU 和 CPU）。有关自动扩缩容等可用选项的更多信息，请参阅 Ray Serve [文档](https://docs.ray.io/en/latest/serve/getting_started.html)。

## 设置 Ray Serve
使用 `pip install ray[serve]` 安装 Ray。

## 通用骨架

部署服务的通用骨架如下：

In [None]:
# 0: Import ray serve and request from starlette
from ray import serve
from starlette.requests import Request


# 1: Define a Ray Serve deployment.
@serve.deployment
class LLMServe:
    def __init__(self) -> None:
        # All the initialization code goes here
        pass

    async def __call__(self, request: Request) -> str:
        # You can parse the request here
        # and return a response
        return "Hello World"


# 2: Bind the model to deployment
deployment = LLMServe.bind()

# 3: Run the deployment
serve.api.run(deployment)

In [None]:
# Shutdown the deployment
serve.api.shutdown()

## 部署 OpenAI 链的示例及自定义提示词

从[这里](https://platform.openai.com/account/api-keys)获取 OpenAI API 密钥。运行以下代码，系统将要求您提供 API 密钥。

In [None]:
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

In [None]:
from getpass import getpass

OPENAI_API_KEY = getpass()

In [None]:
@serve.deployment
class DeployLLM:
    def __init__(self):
        # We initialize the LLM, template and the chain here
        llm = OpenAI(openai_api_key=OPENAI_API_KEY)
        template = "Question: {question}\n\nAnswer: Let's think step by step."
        prompt = PromptTemplate.from_template(template)
        self.chain = LLMChain(llm=llm, prompt=prompt)

    def _run_chain(self, text: str):
        return self.chain(text)

    async def __call__(self, request: Request):
        # 1. Parse the request
        text = request.query_params["text"]
        # 2. Run the chain
        resp = self._run_chain(text)
        # 3. Return the response
        return resp["text"]

现在我们可以绑定部署了。

In [None]:
# Bind the model to deployment
deployment = DeployLLM.bind()

我们可以在运行部署时分配端口号和主机。

In [None]:
# Example port number
PORT_NUMBER = 8282
# Run the deployment
serve.api.run(deployment, port=PORT_NUMBER)

现在服务已经部署在 `localhost:8282` 端口，我们可以发送一个 post 请求来获取结果。

In [None]:
import requests

text = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
response = requests.post(f"http://localhost:{PORT_NUMBER}/?text={text}")
print(response.content.decode())