Skip to content

VowpalWabbit/rl_chain

Repository files navigation

VW in a langchain chain

Install requirements.txt

VowpalWabbit

There is an example notebook (rl_chain.ipynb) with basic usage of the chain.

TLDR:

  • Chain is initialized and creates a Vowpal Wabbit instance - only Contextual Bandits and Slates are supported for now
  • You can change the arguments at chain creation time
  • There is a default prompt but it can be changed
  • There is a default reward function that gets triggered and triggers learn automatically
    • This can be turned off and score can be spcified explicitly

Flow:

  • Developer: creates chain
  • Developer: sets actions
  • Developer: calls chain with context and other prompt inputs
  • Chain: calls VW with the context and selects an action
  • Chain: action (and other vars) are passed to the LLM with the prompt
  • Chain: if default reward set, the LLM is called to judge and give a reward score of the response based on the context
  • Chain: VW learn is triggered with that score

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published