Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add maxToolRoundtrips option to streamText settings #1943

Closed
baptisteArno opened this issue Jun 13, 2024 · 14 comments
Closed

Add maxToolRoundtrips option to streamText settings #1943

baptisteArno opened this issue Jun 13, 2024 · 14 comments
Labels
ai/core ai/ui enhancement New feature or request

Comments

@baptisteArno
Copy link

Feature Description

Would be similar to how this setting works with generateText.

Use Case

In the examples, when streaming with a tool, it's the client (useChat) that makes the request again after a tool is being called. It would be great if that is automatically handled by the server.

Additional context

No response

@lgrammel
Copy link
Collaborator

it's planned. the integration with useChat makes this significantly more complex tho

@baptisteArno
Copy link
Author

Can I help?

Or is there any known workaround to handle that on the server?

@shaded-blue
Copy link

I'd also be curious to hear if you have a workaround in mind @lgrammel / others.

Recursion is what I spent yesterday on, and I really could not decide what I needed to do based on the source and feel like I wasn't just applying band-aids that get washed out by the next update when you guys do push this change.

I'm essentially wanting to use the streamUI itself as an async generator; any thoughts?

@baptisteArno
Copy link
Author

Workaround for now (simplified example), wrapping the readable stream into another one:

const maxToolCalls = 5

export const runOpenAIChatCompletionStream = async ({
  credentials: { apiKey },
  options,
  variables,
  config: openAIConfig,
  compatibility,
  totalToolCalls = 0,
  toolResults,
  toolCalls,
}: Props) => {
  const response = await streamText({
    model,
    temperature: options.temperature ? Number(options.temperature) : undefined,
    messages: await parseChatCompletionMessages({
      options,
      variables,
      toolCalls,
      toolResults,
    }),
    tools: parseTools({ tools: options.tools, variables }),
  })


  return new ReadableStream({
    async start(controller) {
      const reader = response.toAIStream().getReader()

      async function pump(reader: ReadableStreamDefaultReader<Uint8Array>) {
        const { done, value } = await reader.read()

        if (done) {
          toolCalls = (await response.toolCalls) as ToolCallPart[]
          toolResults = (await response.toolResults) as
            | ToolResultPart[]
            | undefined
          return
        }

        controller.enqueue(value)
        return pump(reader)
      }

      await pump(reader)

      if (toolCalls && toolCalls.length > 0 && totalToolCalls < maxToolCalls) {
        totalToolCalls += 1
        const newReader = await runOpenAIChatCompletionStream({
          credentials: { apiKey },
          options,
          variables,
          config: openAIConfig,
          compatibility,
          toolCalls,
          toolResults,
        })
        if (newReader) await pump(newReader.getReader())
      }

      controller.close()
    },
  })
}

Am I doing this correctly? I am not super familiar with streams. I tested out briefly and it seems to work as expected.

@tjazsilovsek
Copy link

Do we have any timeline for this?

Also, when client automatically makes a request, is it possible to control what is sent to the backend and if so, how?

@wong2
Copy link
Contributor

wong2 commented Jul 21, 2024

My current solution:

async function* streamTextWithTools(model: LanguageModel, messages: ChatMessage[], maxRounds = 5) {
  for (let round = 0; round < maxRounds; round++) {
    const result = await streamText({
      model,
      messages,
      tools: { web_search },
    })
    for await (const chunk of result.fullStream) {
      if (chunk.type === 'text-delta') {
        yield chunk
      } else if (chunk.type === 'tool-call') {
        messages.push({ role: 'assistant', content: [chunk] })
      } else if (chunk.type === 'tool-result') {
        messages.push({ role: 'tool', content: [chunk] })
      } else if (chunk.type === 'error') {
        throw chunk.error
      } else if (chunk.type === 'finish' && chunk.finishReason !== 'tool-calls') {
        return
      }
    }
  }
}

@tjazsilovsek
Copy link

@wong2 how do you than convert data to get it to work with useChat hook on frontend? only option i see is to copy the logic from the core package?

@wong2
Copy link
Contributor

wong2 commented Aug 2, 2024

@wong2 how do you than convert data to get it to work with useChat hook on frontend? only option i see is to copy the logic from the core package?

I'm not using useChat on frontend.

@louis030195
Copy link

louis030195 commented Aug 10, 2024

i cannot use useChat personally, because i use Tauri which is client-side nextjs only (unless reimplementing stuff in rust)

my hack:

await generateText({
  model: provider,
  tools: {
    suggest_queries: {
      description: `Suggest queries for the user's question and ask for confirmation. Example: 
        {
          suggested_queries: [
            { content_type: "audio", start_time: "2024-03-01T00:00:00Z", end_time: "2024-03-01T23:59:59Z", q: "screenpipe" },
            { content_type: "ocr", app_name: "arc", start_time: "2024-03-01T00:00:00Z", end_time: "2024-03-01T23:59:59Z", q: "screenpipe" },
          ]
        }
        
        - q contains a single query, again, for example instead of "life plan" just use "life"
        - When using the query_screenpipe tool, respond with only the updated JSON object
        - If you return something else than JSON the universe will come to an end
        - DO NOT add \`\`\`json at the beginning or end of your response
        - Do not use '"' around your response
        - Date & time now is ${new Date().toISOString()}. Adjust start_date and end_date to properly match the user intent time range.
        `,
      parameters: z.object({
        suggested_queries: screenpipeMultiQuery,
        queries_results: z
          .array(z.string())
          .optional()
          .describe(
            "The results of the queries if called after the tool query_screenpipe"
          ),
      }),
      execute: async ({ suggested_queries }) => {
        console.log("Suggested queries:", suggested_queries);
        const confirmation = await askQuestion(
          "Are these queries good? (yes/no): "
        );
        if (confirmation.toLowerCase() === "yes") {
          return { confirmed: true, queries: suggested_queries };
        } else {
          const feedback = await askQuestion(
            "Please provide feedback or adjustments: "
          );
          return { confirmed: false, feedback };
        }
      },
    },
    query_screenpipe: {
      description:
        "Query the local screenpipe instance for relevant information.",
      parameters: screenpipeMultiQuery,
      execute: queryScreenpipeNtimes,
    },
    stream_response: {
      description:
        "Stream the final response to the user. ALWAYS FINISH WITH THIS TOOL",
      parameters: z.object({
        response: z
          .string()
          .describe("The final response to stream to the user"),
      }),
      execute: async ({ response }) => {
        const { textStream } = await streamText({
          model: provider,
          messages: [{ role: "user", content: response }],
        });
        for await (const chunk of textStream) {
          process.stdout.write(chunk);
        }
        console.log("\n");
        throw new Error("STREAM_COMPLETE");
      },
    },
  },
  toolChoice: "required",
  messages: [
    {
      role: "system",
      content: `You are a helpful assistant that uses Screenpipe to answer user questions.
      First, suggest queries to the user and ask for confirmation. If confirmed, proceed with the search.
      If not confirmed, adjust based on user feedback. Use the query_screenpipe tool to search for information,
      and then use the stream_response tool to provide the final answer to the user.
      
      Rules:
      - User's today's date is ${new Date().toISOString().split("T")[0]}
      - Use multiple queries to get more relevant results
      - If the results of the queries are not relevant, adjust the query and ask for confirmation again. Minimize user's effort.
      - ALWAYS END WITH the stream_response tool to stream the final answer to the user
      - In the suggest_queries tool, always tell the user the parameters available to you (e.g. types, etc. Zod given to you) so the user can adjust the query if needed. Suggest few other changes on the arg you used so the user has some ideas.
      - Make sure to use enough data but not too much. Usually 50k+ rows a day.
      
      `,
    },
    {
      role: "user",
      content: input,
    },
  ],
  maxToolRoundtrips: 10,
});

but i suspect this use more tokens than it should (on the final answer of generateText?)

PS: i hope you won't call the LLM police regarding my prompt engineering techniques...

@RostyslavManko
Copy link

@lgrammel Any updates on adding this feature, or it's too complex?

@mishushakov
Copy link

Might add I'd prefer a sensible default so the behaviour matches 1:1 the one I'm getting with OpenAI SDK, otherwise this might feel like a "downgrade".

@lgrammel
Copy link
Collaborator

WIP PR: #2836

@lgrammel
Copy link
Collaborator

Available in ai@3.3.21 https://sdk.vercel.ai/docs/ai-sdk-core/tools-and-tool-calling#example-streamtext

@danielzohar
Copy link

@lgrammel Thanks for this 🙌
I found an issue with the implementation. I reported the details here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai/core ai/ui enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants