Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
aad795a
Adds new directory for service
dbrian57 Sep 24, 2025
a665dde
Adds WB training and serverless RL section
dbrian57 Sep 26, 2025
8342313
Adds Serverless RL reference section with redoc integration
dbrian57 Sep 26, 2025
5a5558a
Remove unnecessary files
dbrian57 Sep 26, 2025
1365cee
Feedback via D. Corbitt
dbrian57 Oct 2, 2025
cda4165
Updates Serverless RL API page with proper spec
dbrian57 Oct 3, 2025
d1adc51
Adds back the Training branding
dbrian57 Oct 3, 2025
eac7606
Updates front matter with training branding
dbrian57 Oct 3, 2025
9d4dfeb
new identifier
dbrian57 Oct 3, 2025
cca7cb6
use cases
dbrian57 Oct 3, 2025
2d40548
fixes top nav rendering in cloudflare preview
dbrian57 Oct 3, 2025
2f908da
Adds static light and dark logos
dbrian57 Oct 3, 2025
5a8ba8e
fixes top nav
dbrian57 Oct 3, 2025
f9fb079
CSS references assets directory now
dbrian57 Oct 3, 2025
da4f30b
new test implementation of redoc using redoc2
dbrian57 Oct 3, 2025
326b5c0
force light mode on redoc page
dbrian57 Oct 3, 2025
9cc3fed
fix scrolling issue in redoc2
dbrian57 Oct 3, 2025
c92099d
Replaces old layout with new one
dbrian57 Oct 3, 2025
61aa8a2
Fixes width of redoc page
dbrian57 Oct 3, 2025
e13cc47
Updates API spec URL for Redoc
dbrian57 Oct 3, 2025
be1ce2f
adds prereqs page
dbrian57 Oct 3, 2025
93b8a6d
adds Serverless RL sub-section
dbrian57 Oct 3, 2025
8cf9d02
Adds available models section
dbrian57 Oct 3, 2025
0b9aa3a
Adds use serverless RL section
dbrian57 Oct 3, 2025
82e017d
Adds usage and limits section
dbrian57 Oct 3, 2025
69a0b7f
Adds API ref placeholder section
dbrian57 Oct 3, 2025
fc4b33f
Updates marketecture diagram
dbrian57 Oct 6, 2025
c79029f
Optimised images with calibre/image-actions
github-actions[bot] Oct 6, 2025
e9911fc
Removes unnecessary static assets
dbrian57 Oct 6, 2025
4ed272c
Merge branch 'docs/training-serverless-rl' of https://github.com/wand…
dbrian57 Oct 6, 2025
7e7f230
Optimised images with calibre/image-actions
github-actions[bot] Oct 6, 2025
fc2c7ad
remove unnecessary navbar changes
dbrian57 Oct 6, 2025
2a4c05d
Merge branch 'docs/training-serverless-rl' of https://github.com/wand…
dbrian57 Oct 6, 2025
1af1329
adds back whitespace to navbar partial
dbrian57 Oct 6, 2025
c9f55a9
Optimised images with calibre/image-actions
github-actions[bot] Oct 6, 2025
622dff6
Training descriptions
dbrian57 Oct 6, 2025
3b6d10a
Merge branch 'docs/training-serverless-rl' of https://github.com/wand…
dbrian57 Oct 6, 2025
d9b2c13
Adds Training to homepage
dbrian57 Oct 6, 2025
7a2e6e1
new training icons
dbrian57 Oct 6, 2025
7e65992
Updates marketecture from official Figma
dbrian57 Oct 6, 2025
92e035e
Optimised images with calibre/image-actions
github-actions[bot] Oct 6, 2025
d28d38b
Adds public preview verbage
dbrian57 Oct 6, 2025
3005832
Merge branch 'docs/training-serverless-rl' of https://github.com/wand…
dbrian57 Oct 6, 2025
6532356
Feedback
dbrian57 Oct 6, 2025
c747dab
Optimised images with calibre/image-actions
github-actions[bot] Oct 6, 2025
b2bb52b
David edits (#1693)
arcticfly Oct 6, 2025
cf5db65
updates to David's copy and shuffles some stuff around
dbrian57 Oct 6, 2025
63c0a2f
spell check
dbrian57 Oct 6, 2025
3c5d387
spelling error
dbrian57 Oct 6, 2025
1d8c872
Reweights top level items in left-nav
dbrian57 Oct 6, 2025
8c8896e
Makes product layout 2x2 on front page
dbrian57 Oct 6, 2025
b2c5a52
Feedback via Noah, Matt, and David
dbrian57 Oct 7, 2025
97e56b1
Fixes API reference page
dbrian57 Oct 7, 2025
b96d377
remove unnecessary menu frontmatter from reference
dbrian57 Oct 7, 2025
a3a2fa5
Adds coreweave link
dbrian57 Oct 7, 2025
efe3bcd
rewrite inference doc
dbrian57 Oct 7, 2025
db815ab
minor copy changes
dbrian57 Oct 7, 2025
87ab4d0
renames file and corrects some grammar
dbrian57 Oct 7, 2025
29e1aae
renames file for real
dbrian57 Oct 7, 2025
519adc4
Feedback from M. Linville and N. Luna
dbrian57 Oct 7, 2025
f2376ad
Updates ToS link and LoRA deletion link, fixes spelling error
dbrian57 Oct 8, 2025
8682337
Updates from launch meeting
dbrian57 Oct 8, 2025
fc91ecc
link fix
dbrian57 Oct 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added assets/icons/Name=Training, Mode=Dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 60 additions & 0 deletions assets/icons/Name=Training, Mode=Dark.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/images/general/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 25 additions & 1 deletion content/en/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,12 @@ Use [W&B Weave](https://weave-docs.wandb.ai/) to manage AI models in your code.
- [Use Weave in your W&B runs]({{< relref "/guides/weave/set-up-weave" >}})

</div>{{% /card %}}
{{< /cardpane >}}

</div>

<div class="bottom-row-cards">
{{< cardpane >}}
{{% card %}}<div onclick="window.location.href='/guides/inference/'" style="cursor: pointer;">

<div className="card-banner-icon" style="float:left;margin-right:10px !important; margin-top: -12px !important">
Expand All @@ -63,8 +68,27 @@ Use [W&B Inference]({{< relref "/guides/inference/" >}}) to access leading open-
- [API Reference]({{< relref "/guides/inference/api-reference/" >}})
- [Try in Playground](https://wandb.ai/inference)

</div>{{% /card %}}

{{% card %}}<div onclick="window.location.href='/guides/training/'" style="cursor: pointer;">

<div className="card-banner-icon" style="float:left;margin-right:10px !important; margin-top: -12px !important">
{{< img src="/icons/Name=Training, Mode=Dark.svg" width="60" height="60" >}}
</div>
<h2>W&B Training</h2>

### Post-train your models

Now in public preview, use [W&B Training]({{< relref "/guides/training/" >}}) to post-train large language models using serverless reinforcement learning (RL). Features include fully managed GPU infrastructure, integration with ART and RULER, and automatic scaling for multi-turn agentic tasks.

- [Introduction]({{< relref "/guides/training/" >}})
- [Prerequisites]({{< relref "/guides/training/prerequisites/" >}})
- [Serverless RL]({{< relref "/guides/training/serverless-rl/" >}})
- [API Reference]({{< relref "/ref/training" >}})

</div>{{% /card %}}
{{< /cardpane >}}

</div>

<!-- End max-width constraing -->
Expand All @@ -75,7 +99,7 @@ Use [W&B Inference]({{< relref "/guides/inference/" >}}) to access leading open-
p { overflow: hidden; display: block; }
ul { margin-left: 50px; }

/* Make all cards uniform size in 3x2 grid */
/* Make all cards uniform size in 2x2 grid */
.top-row-cards .td-card-group,
.bottom-row-cards .td-card-group {
max-width: 100%;
Expand Down
2 changes: 2 additions & 0 deletions content/en/guides/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ W&B consists of three major components: [Models]({{< relref "/guides/models.md"

**[W&B Inference]({{< relref "/guides/inference/" >}})** is a set of tools for accessing open-source foundation models through W&B Weave and an OpenAI-compatible API.

**[W&B Training]({{< relref "/guides/training/" >}})** provides serverless reinforcement learning for post-training LLMs to improve reliability on multi-turn agentic tasks.

{{% alert %}}
Learn about recent releases in the [W&B release notes]({{< relref "/ref/release-notes/" >}}).
{{% /alert %}}
Expand Down
2 changes: 1 addition & 1 deletion content/en/guides/core/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ menu:
default:
identifier: core
title: W&B Core
weight: 6
weight: 70
no_list: true
---

Expand Down
2 changes: 1 addition & 1 deletion content/en/guides/hosting/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ menu:
default:
identifier: w-b-platform
title: W&B Platform
weight: 7
weight: 80
no_list: true
---
W&B Platform is the foundational infrastructure, tooling and governance scaffolding which supports the W&B products like [Core]({{< relref "/guides/core" >}}), [Models]({{< relref "/guides/models/" >}}) and [Weave]({{< relref "/guides/weave/" >}}).
Expand Down
2 changes: 1 addition & 1 deletion content/en/guides/inference/_index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "W&B Inference"
weight: 5
weight: 50
description: >
Access open-source foundation models through W&B Weave and an OpenAI-compatible API
---
Expand Down
2 changes: 1 addition & 1 deletion content/en/guides/integrations/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ menu:
default:
identifier: integrations
title: Integrations
weight: 8
weight: 90
url: guides/integrations
cascade:
- url: guides/integrations/:filename
Expand Down
2 changes: 1 addition & 1 deletion content/en/guides/models/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ menu:
default:
identifier: models
title: W&B Models
weight: 3
weight: 30
no_list: true
---

Expand Down
2 changes: 1 addition & 1 deletion content/en/guides/models_quickstart.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Get Started with W&B Models
weight: 2
weight: 20
---

Learn when and how to use W&B to track, share, and manage model artifacts in your machine learning workflows. This page covers logging experiments, generating reports, and accessing logged data using the appropriate W&B API for each task.
Expand Down
2 changes: 1 addition & 1 deletion content/en/guides/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ menu:
parent: guides
title: W&B Quickstart
url: quickstart
weight: 1
weight: 10
---
Install W&B to track, visualize, and manage machine learning experiments of any size.

Expand Down
18 changes: 18 additions & 0 deletions content/en/guides/training/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
menu:
default:
identifier: training
title: W&B Training
description: Post-train your models using reinforcement learning.
weight: 60
---

Now in public preview, W&B Training offers serverless reinforcement learning (RL) for post-training large language models (LLMs) to improve their reliability performing multi-turn, agentic tasks while also increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs.

W&B Training includes integration with:

* [ART](https://art.openpipe.ai/getting-started/about), a flexible RL fine-tuning framework.
* [RULER](https://openpipe.ai/blog/ruler), a universal verifier.
* A fully-managed backend on [CoreWeave Cloud](https://docs.coreweave.com/docs/platform).

To get started, satisfy the [prerequisites]({{< relref "/guides/training/prerequisites" >}}) to start using the service and then see [OpenPipe's Serverless RL quickstart](https://art.openpipe.ai/getting-started/quick-start) to learn how to post-train your models.
8 changes: 8 additions & 0 deletions content/en/guides/training/api-reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "API Reference"
linkTitle: "API Reference"
weight: 100
manualLink: "/ref/training"
description: >
Complete API documentation for W&B Training.
---
28 changes: 28 additions & 0 deletions content/en/guides/training/prerequisites.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: "Prerequisites"
linkTitle: "Prerequisites"
weight: 1
description: >
Set up your environment to use W&B Training.
---

Complete these steps before using W&B Training features through the OpenPipe ART framework or API.

{{< alert title="Tip" >}}
Before starting, review the [usage information and limits]({{< relref "guides/training/serverless-rl/usage-limits" >}}) to understand costs and restrictions.
{{< /alert >}}

## Sign up and create an API key

To authenticate your machine with W&B, you must first generate an API key at [wandb.ai/authorize](https://wandb.ai/authorize). Copy the API key and store it securely.

## Create a project in W&B

Create a project in your W&B account to track usage, record training metrics, and save trained models. See the [Projects guide](https://docs.wandb.ai/guides/track/project-page) for more information.

## Next steps

After completing the prerequisites:

* Check the [API reference]({{< relref "/ref/training" >}}) to learn about available endpoints
* Try the [ART quickstart](https://art.openpipe.ai/getting-started/quick-start)
37 changes: 37 additions & 0 deletions content/en/guides/training/serverless-rl/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
menu:
default:
identifier: serverless-rl
title: Serverless RL
description: Learn about how to more efficiently post-train your models using reinforcement learning.
weight: 5
---

Now in public preview, Serverless RL helps developers post-train LLMs to learn new behaviors and improve reliability, speed, and costs when performing multi-turn agentic tasks. W&B provision the training infrastructure ([on CoreWeave](https://docs.coreweave.com/docs/platform)) for you while allowing full flexibility in your environment's setup. Serverless RL gives you instant access to a managed training cluster that elastically auto-scales to dozens of GPUs. By splitting RL workflows into inference and training phases and multiplexing them across jobs, Serverless RL increases GPU utilization and reduces your training time and costs.

Serverless RL is ideal for tasks like:
* Voice agents
* Deep research assistants
* On-prem models
* Content marketing analysis agents

Serverless RL trains low-rank adapters (LoRAs) to specialize a model for your agent's specific task. This extends the original model’s capabilities with on-the-job experience. The LoRAs you train are automatically stored as artifacts in your W&B account, and can be saved locally or to a third party for backup. Models that you train through Serverless RL are also automatically hosted on W&B Inference.

## Why Serverless RL?

Reinforcement learning (RL) is a set of powerful training techniques that you can use in many kinds of training setups, including on GPUs that you own or rent directly. Serverless RL can provide the following advantages in your RL post-training:

* **Lower training costs**: By multiplexing shared infrastructure across many users, skipping the setup process for each job, and scaling your GPU costs down to 0 when you're not actively training, Serverless RL reduces training costs significantly.
* **Faster training time**: By splitting inference requests across many GPUs and immediately provisioning training infrastructure when you need it, Serverless RL speeds up your training jobs and lets you iterate faster.
* **Automatic deployment**: Serverless RL automatically deploys every checkpoint you train, eliminating the need to manually set up hosting infrastructure. Trained models can be accessed and tested immediately in local, staging, or production environments.

## How Serverless RL uses W&B services

Serverless RL uses a combination of the following W&B components to operate:

* [Inference]({{< relref "guides/inference" >}}): To run your models
* [Models]({{< relref "guides/models" >}}): To track performance metrics during the LoRA adapter's training
* [Artifacts]({{< relref "guides/core/artifacts" >}}): To store and version the LoRA adapters
* [Weave (optional)]({{< relref "guides/models" >}}): To gain observability into how the model responds at each step of the training loop

Serverless RL is in public preview. During the preview, you are charged only for the use of inference and the storage of artifacts. W&B does not charge for adapter training during the preview period.
17 changes: 17 additions & 0 deletions content/en/guides/training/serverless-rl/available-models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: "Available models"
linkTitle: "Available models"
weight: 40
description: >
See the models you can train with Serverless RL.
---

Serverless RL currently only supports a single open-source foundation model for training.

To express interest in a particular model, contact [support](mailto:support@wandb.ai).

## Model catalog

| Model | Model ID (for API usage) | Type | Context Window | Parameters | Description |
|-------|--------------------------|------|----------------|------------|-------------|
| Qwen2.5 14B | Qwen/Qwen2.5-14B-Instruct | Text | 32K | 14B (Active-Total) | Dense model optimized for throughput and quality |
9 changes: 9 additions & 0 deletions content/en/guides/training/serverless-rl/serverless-rl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
description: Get started using Serverless RL.
title: Use Serverless RL
weight: 10
---

Serverless RL is supported through [OpenPipe's ART framework](https://art.openpipe.ai/getting-started/about) and the [W&B Training API]({{< relref "ref/training" >}}).

To start using Serverless RL, see the ART [quickstart](https://art.openpipe.ai/getting-started/quick-start) for code examples and workflows. To learn about Serverless RL's API endpoints, see the W&B Training API.
33 changes: 33 additions & 0 deletions content/en/guides/training/serverless-rl/usage-limits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: "Usage information and limits"
linkTitle: "Usage & limits"
weight: 30
description: >
Understand pricing, usage limits, and account restrictions for W&B Serverless RL.
---

## Pricing

Pricing has three components: inference, training, and storage. For specific billing rates, visit our [pricing page](https://wandb.ai/site/pricing/reinforcement-learning).

### Inference

Pricing for Serverless RL inference requests matches W&B Inference pricing. See [model-specific costs](https://site.wandb.ai/pricing/reinforcement-learning) for more details. Learn more about purchasing credits, account tiers, and usage caps in the [W&B Inference docs]({{< relref "/guides/inference/usage-limits/#purchase-more-credits" >}}).

### Training

At each training step, Serverless RL collects batches of trajectories that include your agent's outputs and associated rewards (calculated by your reward function). The batched trajectories are then used to update the weights of a LoRA adapter that specializes a base model for your task. The training jobs to update these LoRAs run on dedicated GPU clusters managed by Serverless RL.

Training is free during the public preview period.

### Model storage

Serverless RL stores checkpoints of your trained LoRAs so you can evaluate, serve, or continue training them at any time. Storage is billed monthly based on total checkpoint size and your [pricing plan](https://wandb.ai/site/pricing). Every plan includes at least 5GB of free storage, which is enough for roughly 30 LoRAs. We recommend deleting low-performing LoRAs to save space. See the [ART SDK](https://art.openpipe.ai/features/checkpoint-deletion) for instructions on how to do this.

## Limits

* **Inference concurrency limits**: By default, Serverless RL currently supports up to 2000 concurrent requests per user and 6000 per project. If you exceed your rate limit, the Inference API returns a `429 Concurrency limit reached for requests` response. To avoid this error, reduce the number of concurrent requests your training job or production workload makes at once. If you need a higher rate limit, you can request one at support@wandb.com.

* **Personal entities unsupported**: Serverless RL and W&B Inference don't support personal entities (personal accounts). To access Serverless RL, switch to a non-personal account by [creating a Team]({{< relref "/guides/hosting/iam/access-management/manage-organization/#add-and-manage-teams" >}}). Personal entities (personal accounts) were deprecated in May 2024, so this advisory only applies to legacy accounts.

* **Geographic restrictions**: Serverless RL is only available in supported geographic locations. For more information, see the [Terms of Service](https://site.wandb.ai/terms/).
Loading
Loading