Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helpers (or best practices) for non-streaming API response? #60

Closed
wadefletch opened this issue Jun 15, 2023 · 6 comments
Closed

Helpers (or best practices) for non-streaming API response? #60

wadefletch opened this issue Jun 15, 2023 · 6 comments

Comments

@wadefletch
Copy link

wadefletch commented Jun 15, 2023

Thanks for creating this great tool! I started to create something similar last night, and was glad to see this today.

I'm using the new Function Calling API to generate improvement recommendations that are then highlighted in a block of text. (Similar to Grammarly.) I'm not doing a full agent-style loop or anything, just using the function to ensure the results are in the right schema.

With JSON's required closing tags, I can't parse or render a partial (streamed) response. (I'm not even sure if FC can be streamed.) Is there a best practice for how to return these non-streaming results back to the client in a way that is compatible with useCompletion and useChat? I'm currently using the following and it seems to work well enough.

const response = await openai.createChatCompletion({
    model: 'gpt-3.5-turbo-0613',
    messages: [...],
    functions: [
      { name: 'makeRecommendation', parameters: recommendationSchema },
    ],
    function_call: { name: 'makeRecommendation' },
    max_tokens: 1000,
    temperature: 0.7,
  });

  const parsedResponse = await response.json();

  const calledFunction = JSON.parse(
    parsedResponse.choices[0].message.function_call.arguments,
  );

  return new Response(JSON.stringify(calledFunction.recommendations), {
    headers: new Headers({ 'Cache-Control': 'no-cache' }),
  });

Is there an equivalent of OpenAIStream and StreamingTextResponse to similarly abstract non-streaming responses? Is that somewhere you'd be open to a contribution?

@theluk
Copy link

theluk commented Jun 15, 2023

I am also checking in here because of this.

So 2 things I found

  1. it seems openai returns function calls in streams https://community.openai.com/t/function-calls-and-streaming/263393/2
  2. this SDK uses its AIStream in which the customParser is only looking at delta.content.text so it swallows the possibility to stream the function call response

To your question regarding JSON closing tags, I implemented a month ago a streaming plugin aware chatbot. What you are looking for is basically a parser that whenever you see tokens that technically matches json in its correct syntax, you buffer them, until you have a full valid JSON, and than you yield that whole thing.

To make it easier, I used prompt engineering to wrap any json into a |START| and |END| tag. This way I can more easily start buffering

So in the end you will have a stream that looks like

Of
course
let
me
check
the
plugin
|START| -- start buffering now
.buffer: {
.buffer: { function_call
.buffer: { function_call: check_the_weather, arguments ...}
|END| -- set the model with a stop word to equal |END|

Anyways that is a completely different implementation than open ai suggests, as the function calls from them cannot come along with text from the AI. But still I thought worth sharing

@theluk
Copy link

theluk commented Jun 15, 2023

So I got it working with a custom AIStream, I basically just copied their OpenAIStream and adapted it slightly. The functions calls are now prompted out.

Note, of course that would require further parsing

import { AIStream, trimStartOfStreamHelper, type AIStreamCallbacks } from "ai";

function parseOpenAIStream(): (data: string) => string | void {
  const trimStartOfStream = trimStartOfStreamHelper();
  return (data) => {
    // TODO: Needs a type
    const json = JSON.parse(data);

    // this can be used for either chat or completion models
    const text = trimStartOfStream(
      json.choices[0]?.delta?.function_call
        ? JSON.stringify(json.choices[0]?.delta?.function_call)
        : json.choices[0]?.delta?.content ?? json.choices[0]?.text ?? ""
    );

    return text;
  };
}

export function OpenAIStream(
  res: Response,
  cb?: AIStreamCallbacks
): ReadableStream {
  return AIStream(res, parseOpenAIStream(), cb);
}

@wadefletch
Copy link
Author

Thanks! It looks like streaming function calls is relatively easy, but it's still challenging to parse partial arguments.

@theluk
Copy link

theluk commented Jun 15, 2023

I looked at the results and they do it quite clever. I like the way that it makes sure that one chunk is always a valid json. So you could put into the parser just some annotation while the function call is happening, and then merging the stuff.

i got this out of the stream

{"name":"get_current_weather","arguments":""}
{"arguments":"{\n"}
{"arguments":" "}
{"arguments":" \""}
{"arguments":"location"}
{"arguments":"\":"}
{"arguments":" \""}
{"arguments":"Berlin"}
{"arguments":"\"\n"}
{"arguments":"}"}

So what you need to do is just merging the stuff together. In the arguments example it would be the logic of just concatinating the string

@theluk
Copy link

theluk commented Jun 15, 2023

Here is an updated version i just monkey coded together, might help

import { AIStream, trimStartOfStreamHelper, type AIStreamCallbacks } from "ai";

function parseOpenAIStream(): (data: string) => string | void {
  const trimStartOfStream = trimStartOfStreamHelper();

  let currentFunctionCall: {
    name: string;
    arguments: string[];
  } | null = null;

  return (data) => {
    // TODO: Needs a type
    const json = JSON.parse(data);

    if (json.choices[0]?.delta?.function_call) {
      if (!currentFunctionCall) {
        currentFunctionCall = {
          name: json.choices[0].delta.function_call.name,
          arguments: [json.choices[0].delta.function_call.arguments],
        };
      } else {
        currentFunctionCall.arguments.push(
          json.choices[0].delta.function_call.arguments
        );
      }
    }

    if (json.choices[0]?.finish_reason === "function_call") {
      const functionCall = currentFunctionCall;
      currentFunctionCall = null;
      return JSON.stringify({
        function_call: functionCall?.name,
        arguments: JSON.parse(functionCall?.arguments.join("") ?? "[]"),
      });
    }

    // this can be used for either chat or completion models
    const text = trimStartOfStream(
      json.choices[0]?.delta?.content ?? json.choices[0]?.text ?? ""
    );

    return text;
  };
}

export function OpenAIStream(
  res: Response,
  cb?: AIStreamCallbacks
): ReadableStream {
  return AIStream(res, parseOpenAIStream(), cb);
}

Unfortunately worth nothing, this parser has side effects, as it holds a buffer outside. There are better solutions.
anyways, it buffers the full function call inside an object, and only if it is complete, it will be streamed back to you.

@Zakinator123
Copy link
Contributor

I just put up a PR that allows for streaming function responses to be streamed back to clients (who then can parse the JSON once the response is finished). #154

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants