-
Notifications
You must be signed in to change notification settings - Fork 32
W&B Training: Serverless RL launch #1682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
63 commits
Select commit
Hold shift + click to select a range
aad795a
Adds new directory for service
dbrian57 a665dde
Adds WB training and serverless RL section
dbrian57 8342313
Adds Serverless RL reference section with redoc integration
dbrian57 5a5558a
Remove unnecessary files
dbrian57 1365cee
Feedback via D. Corbitt
dbrian57 cda4165
Updates Serverless RL API page with proper spec
dbrian57 d1adc51
Adds back the Training branding
dbrian57 eac7606
Updates front matter with training branding
dbrian57 9d4dfeb
new identifier
dbrian57 cca7cb6
use cases
dbrian57 2d40548
fixes top nav rendering in cloudflare preview
dbrian57 2f908da
Adds static light and dark logos
dbrian57 5a8ba8e
fixes top nav
dbrian57 f9fb079
CSS references assets directory now
dbrian57 da4f30b
new test implementation of redoc using redoc2
dbrian57 326b5c0
force light mode on redoc page
dbrian57 9cc3fed
fix scrolling issue in redoc2
dbrian57 c92099d
Replaces old layout with new one
dbrian57 61aa8a2
Fixes width of redoc page
dbrian57 e13cc47
Updates API spec URL for Redoc
dbrian57 be1ce2f
adds prereqs page
dbrian57 93b8a6d
adds Serverless RL sub-section
dbrian57 8cf9d02
Adds available models section
dbrian57 0b9aa3a
Adds use serverless RL section
dbrian57 82e017d
Adds usage and limits section
dbrian57 69a0b7f
Adds API ref placeholder section
dbrian57 fc4b33f
Updates marketecture diagram
dbrian57 c79029f
Optimised images with calibre/image-actions
github-actions[bot] e9911fc
Removes unnecessary static assets
dbrian57 4ed272c
Merge branch 'docs/training-serverless-rl' of https://github.com/wand…
dbrian57 7e7f230
Optimised images with calibre/image-actions
github-actions[bot] fc2c7ad
remove unnecessary navbar changes
dbrian57 2a4c05d
Merge branch 'docs/training-serverless-rl' of https://github.com/wand…
dbrian57 1af1329
adds back whitespace to navbar partial
dbrian57 c9f55a9
Optimised images with calibre/image-actions
github-actions[bot] 622dff6
Training descriptions
dbrian57 3b6d10a
Merge branch 'docs/training-serverless-rl' of https://github.com/wand…
dbrian57 d9b2c13
Adds Training to homepage
dbrian57 7a2e6e1
new training icons
dbrian57 7e65992
Updates marketecture from official Figma
dbrian57 92e035e
Optimised images with calibre/image-actions
github-actions[bot] d28d38b
Adds public preview verbage
dbrian57 3005832
Merge branch 'docs/training-serverless-rl' of https://github.com/wand…
dbrian57 6532356
Feedback
dbrian57 c747dab
Optimised images with calibre/image-actions
github-actions[bot] b2bb52b
David edits (#1693)
arcticfly cf5db65
updates to David's copy and shuffles some stuff around
dbrian57 63c0a2f
spell check
dbrian57 3c5d387
spelling error
dbrian57 1d8c872
Reweights top level items in left-nav
dbrian57 8c8896e
Makes product layout 2x2 on front page
dbrian57 b2c5a52
Feedback via Noah, Matt, and David
dbrian57 97e56b1
Fixes API reference page
dbrian57 b96d377
remove unnecessary menu frontmatter from reference
dbrian57 a3a2fa5
Adds coreweave link
dbrian57 efe3bcd
rewrite inference doc
dbrian57 db815ab
minor copy changes
dbrian57 87ab4d0
renames file and corrects some grammar
dbrian57 29e1aae
renames file for real
dbrian57 519adc4
Feedback from M. Linville and N. Luna
dbrian57 f2376ad
Updates ToS link and LoRA deletion link, fixes spelling error
dbrian57 8682337
Updates from launch meeting
dbrian57 fc91ecc
link fix
dbrian57 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ menu: | |
default: | ||
identifier: core | ||
title: W&B Core | ||
weight: 6 | ||
weight: 70 | ||
no_list: true | ||
--- | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ menu: | |
default: | ||
identifier: models | ||
title: W&B Models | ||
weight: 3 | ||
weight: 30 | ||
no_list: true | ||
--- | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
--- | ||
menu: | ||
default: | ||
identifier: training | ||
title: W&B Training | ||
description: Post-train your models using reinforcement learning. | ||
weight: 60 | ||
--- | ||
|
||
Now in public preview, W&B Training offers serverless reinforcement learning (RL) for post-training large language models (LLMs) to improve their reliability performing multi-turn, agentic tasks while also increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs. | ||
|
||
W&B Training includes integration with: | ||
|
||
* [ART](https://art.openpipe.ai/getting-started/about), a flexible RL fine-tuning framework. | ||
* [RULER](https://openpipe.ai/blog/ruler), a universal verifier. | ||
* A fully-managed backend on [CoreWeave Cloud](https://docs.coreweave.com/docs/platform). | ||
|
||
To get started, satisfy the [prerequisites]({{< relref "/guides/training/prerequisites" >}}) to start using the service and then see [OpenPipe's Serverless RL quickstart](https://art.openpipe.ai/getting-started/quick-start) to learn how to post-train your models. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
dbrian57 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
title: "API Reference" | ||
linkTitle: "API Reference" | ||
weight: 100 | ||
manualLink: "/ref/training" | ||
description: > | ||
Complete API documentation for W&B Training. | ||
--- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
--- | ||
title: "Prerequisites" | ||
linkTitle: "Prerequisites" | ||
weight: 1 | ||
description: > | ||
Set up your environment to use W&B Training. | ||
--- | ||
|
||
Complete these steps before using W&B Training features through the OpenPipe ART framework or API. | ||
|
||
{{< alert title="Tip" >}} | ||
Before starting, review the [usage information and limits]({{< relref "guides/training/serverless-rl/usage-limits" >}}) to understand costs and restrictions. | ||
{{< /alert >}} | ||
|
||
## Sign up and create an API key | ||
|
||
To authenticate your machine with W&B, you must first generate an API key at [wandb.ai/authorize](https://wandb.ai/authorize). Copy the API key and store it securely. | ||
|
||
## Create a project in W&B | ||
|
||
Create a project in your W&B account to track usage, record training metrics, and save trained models. See the [Projects guide](https://docs.wandb.ai/guides/track/project-page) for more information. | ||
|
||
## Next steps | ||
|
||
After completing the prerequisites: | ||
|
||
* Check the [API reference]({{< relref "/ref/training" >}}) to learn about available endpoints | ||
* Try the [ART quickstart](https://art.openpipe.ai/getting-started/quick-start) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
dbrian57 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
menu: | ||
default: | ||
identifier: serverless-rl | ||
title: Serverless RL | ||
description: Learn about how to more efficiently post-train your models using reinforcement learning. | ||
weight: 5 | ||
--- | ||
|
||
Now in public preview, Serverless RL helps developers post-train LLMs to learn new behaviors and improve reliability, speed, and costs when performing multi-turn agentic tasks. W&B provision the training infrastructure ([on CoreWeave](https://docs.coreweave.com/docs/platform)) for you while allowing full flexibility in your environment's setup. Serverless RL gives you instant access to a managed training cluster that elastically auto-scales to dozens of GPUs. By splitting RL workflows into inference and training phases and multiplexing them across jobs, Serverless RL increases GPU utilization and reduces your training time and costs. | ||
|
||
Serverless RL is ideal for tasks like: | ||
* Voice agents | ||
* Deep research assistants | ||
* On-prem models | ||
* Content marketing analysis agents | ||
|
||
Serverless RL trains low-rank adapters (LoRAs) to specialize a model for your agent's specific task. This extends the original model’s capabilities with on-the-job experience. The LoRAs you train are automatically stored as artifacts in your W&B account, and can be saved locally or to a third party for backup. Models that you train through Serverless RL are also automatically hosted on W&B Inference. | ||
|
||
## Why Serverless RL? | ||
|
||
Reinforcement learning (RL) is a set of powerful training techniques that you can use in many kinds of training setups, including on GPUs that you own or rent directly. Serverless RL can provide the following advantages in your RL post-training: | ||
|
||
* **Lower training costs**: By multiplexing shared infrastructure across many users, skipping the setup process for each job, and scaling your GPU costs down to 0 when you're not actively training, Serverless RL reduces training costs significantly. | ||
* **Faster training time**: By splitting inference requests across many GPUs and immediately provisioning training infrastructure when you need it, Serverless RL speeds up your training jobs and lets you iterate faster. | ||
* **Automatic deployment**: Serverless RL automatically deploys every checkpoint you train, eliminating the need to manually set up hosting infrastructure. Trained models can be accessed and tested immediately in local, staging, or production environments. | ||
|
||
## How Serverless RL uses W&B services | ||
|
||
Serverless RL uses a combination of the following W&B components to operate: | ||
|
||
* [Inference]({{< relref "guides/inference" >}}): To run your models | ||
* [Models]({{< relref "guides/models" >}}): To track performance metrics during the LoRA adapter's training | ||
* [Artifacts]({{< relref "guides/core/artifacts" >}}): To store and version the LoRA adapters | ||
* [Weave (optional)]({{< relref "guides/models" >}}): To gain observability into how the model responds at each step of the training loop | ||
|
||
Serverless RL is in public preview. During the preview, you are charged only for the use of inference and the storage of artifacts. W&B does not charge for adapter training during the preview period. |
17 changes: 17 additions & 0 deletions
17
content/en/guides/training/serverless-rl/available-models.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
--- | ||
title: "Available models" | ||
linkTitle: "Available models" | ||
weight: 40 | ||
description: > | ||
See the models you can train with Serverless RL. | ||
--- | ||
|
||
Serverless RL currently only supports a single open-source foundation model for training. | ||
|
||
To express interest in a particular model, contact [support](mailto:support@wandb.ai). | ||
|
||
## Model catalog | ||
|
||
| Model | Model ID (for API usage) | Type | Context Window | Parameters | Description | | ||
|-------|--------------------------|------|----------------|------------|-------------| | ||
| Qwen2.5 14B | Qwen/Qwen2.5-14B-Instruct | Text | 32K | 14B (Active-Total) | Dense model optimized for throughput and quality | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
description: Get started using Serverless RL. | ||
title: Use Serverless RL | ||
weight: 10 | ||
--- | ||
|
||
Serverless RL is supported through [OpenPipe's ART framework](https://art.openpipe.ai/getting-started/about) and the [W&B Training API]({{< relref "ref/training" >}}). | ||
|
||
To start using Serverless RL, see the ART [quickstart](https://art.openpipe.ai/getting-started/quick-start) for code examples and workflows. To learn about Serverless RL's API endpoints, see the W&B Training API. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
--- | ||
title: "Usage information and limits" | ||
linkTitle: "Usage & limits" | ||
weight: 30 | ||
description: > | ||
Understand pricing, usage limits, and account restrictions for W&B Serverless RL. | ||
--- | ||
|
||
## Pricing | ||
|
||
Pricing has three components: inference, training, and storage. For specific billing rates, visit our [pricing page](https://wandb.ai/site/pricing/reinforcement-learning). | ||
|
||
### Inference | ||
|
||
Pricing for Serverless RL inference requests matches W&B Inference pricing. See [model-specific costs](https://site.wandb.ai/pricing/reinforcement-learning) for more details. Learn more about purchasing credits, account tiers, and usage caps in the [W&B Inference docs]({{< relref "/guides/inference/usage-limits/#purchase-more-credits" >}}). | ||
|
||
### Training | ||
|
||
At each training step, Serverless RL collects batches of trajectories that include your agent's outputs and associated rewards (calculated by your reward function). The batched trajectories are then used to update the weights of a LoRA adapter that specializes a base model for your task. The training jobs to update these LoRAs run on dedicated GPU clusters managed by Serverless RL. | ||
|
||
Training is free during the public preview period. | ||
|
||
### Model storage | ||
|
||
Serverless RL stores checkpoints of your trained LoRAs so you can evaluate, serve, or continue training them at any time. Storage is billed monthly based on total checkpoint size and your [pricing plan](https://wandb.ai/site/pricing). Every plan includes at least 5GB of free storage, which is enough for roughly 30 LoRAs. We recommend deleting low-performing LoRAs to save space. See the [ART SDK](https://art.openpipe.ai/features/checkpoint-deletion) for instructions on how to do this. | ||
|
||
## Limits | ||
|
||
* **Inference concurrency limits**: By default, Serverless RL currently supports up to 2000 concurrent requests per user and 6000 per project. If you exceed your rate limit, the Inference API returns a `429 Concurrency limit reached for requests` response. To avoid this error, reduce the number of concurrent requests your training job or production workload makes at once. If you need a higher rate limit, you can request one at support@wandb.com. | ||
|
||
* **Personal entities unsupported**: Serverless RL and W&B Inference don't support personal entities (personal accounts). To access Serverless RL, switch to a non-personal account by [creating a Team]({{< relref "/guides/hosting/iam/access-management/manage-organization/#add-and-manage-teams" >}}). Personal entities (personal accounts) were deprecated in May 2024, so this advisory only applies to legacy accounts. | ||
|
||
* **Geographic restrictions**: Serverless RL is only available in supported geographic locations. For more information, see the [Terms of Service](https://site.wandb.ai/terms/). |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.