+
+### Post-train your models
+
+Now in public preview, use [W&B Training]({{< relref "/guides/training/" >}}) to post-train large language models using serverless reinforcement learning (RL). Features include fully managed GPU infrastructure, integration with ART and RULER, and automatic scaling for multi-turn agentic tasks.
+
+- [Introduction]({{< relref "/guides/training/" >}})
+- [Prerequisites]({{< relref "/guides/training/prerequisites/" >}})
+- [Serverless RL]({{< relref "/guides/training/serverless-rl/" >}})
+- [API Reference]({{< relref "/ref/training" >}})
+
{{% /card %}}
{{< /cardpane >}}
+
@@ -75,7 +99,7 @@ Use [W&B Inference]({{< relref "/guides/inference/" >}}) to access leading open-
p { overflow: hidden; display: block; }
ul { margin-left: 50px; }
-/* Make all cards uniform size in 3x2 grid */
+/* Make all cards uniform size in 2x2 grid */
.top-row-cards .td-card-group,
.bottom-row-cards .td-card-group {
max-width: 100%;
diff --git a/content/en/guides/_index.md b/content/en/guides/_index.md
index 4ceb532872..62ae807b51 100644
--- a/content/en/guides/_index.md
+++ b/content/en/guides/_index.md
@@ -33,6 +33,8 @@ W&B consists of three major components: [Models]({{< relref "/guides/models.md"
**[W&B Inference]({{< relref "/guides/inference/" >}})** is a set of tools for accessing open-source foundation models through W&B Weave and an OpenAI-compatible API.
+**[W&B Training]({{< relref "/guides/training/" >}})** provides serverless reinforcement learning for post-training LLMs to improve reliability on multi-turn agentic tasks.
+
{{% alert %}}
Learn about recent releases in the [W&B release notes]({{< relref "/ref/release-notes/" >}}).
{{% /alert %}}
diff --git a/content/en/guides/core/_index.md b/content/en/guides/core/_index.md
index 4a26ba3e19..da95cd63f0 100644
--- a/content/en/guides/core/_index.md
+++ b/content/en/guides/core/_index.md
@@ -3,7 +3,7 @@ menu:
default:
identifier: core
title: W&B Core
-weight: 6
+weight: 70
no_list: true
---
diff --git a/content/en/guides/hosting/_index.md b/content/en/guides/hosting/_index.md
index 525ec9b574..9ffc064abd 100644
--- a/content/en/guides/hosting/_index.md
+++ b/content/en/guides/hosting/_index.md
@@ -3,7 +3,7 @@ menu:
default:
identifier: w-b-platform
title: W&B Platform
-weight: 7
+weight: 80
no_list: true
---
W&B Platform is the foundational infrastructure, tooling and governance scaffolding which supports the W&B products like [Core]({{< relref "/guides/core" >}}), [Models]({{< relref "/guides/models/" >}}) and [Weave]({{< relref "/guides/weave/" >}}).
diff --git a/content/en/guides/inference/_index.md b/content/en/guides/inference/_index.md
index a6b2df3751..df096d81a3 100644
--- a/content/en/guides/inference/_index.md
+++ b/content/en/guides/inference/_index.md
@@ -1,6 +1,6 @@
---
title: "W&B Inference"
-weight: 5
+weight: 50
description: >
Access open-source foundation models through W&B Weave and an OpenAI-compatible API
---
diff --git a/content/en/guides/integrations/_index.md b/content/en/guides/integrations/_index.md
index 4aa0973507..21a86f0487 100644
--- a/content/en/guides/integrations/_index.md
+++ b/content/en/guides/integrations/_index.md
@@ -3,7 +3,7 @@ menu:
default:
identifier: integrations
title: Integrations
-weight: 8
+weight: 90
url: guides/integrations
cascade:
- url: guides/integrations/:filename
diff --git a/content/en/guides/models/_index.md b/content/en/guides/models/_index.md
index b7605d6cf8..806dcd90fb 100644
--- a/content/en/guides/models/_index.md
+++ b/content/en/guides/models/_index.md
@@ -3,7 +3,7 @@ menu:
default:
identifier: models
title: W&B Models
-weight: 3
+weight: 30
no_list: true
---
diff --git a/content/en/guides/models_quickstart.md b/content/en/guides/models_quickstart.md
index a43e3b1717..d95b8693de 100644
--- a/content/en/guides/models_quickstart.md
+++ b/content/en/guides/models_quickstart.md
@@ -1,6 +1,6 @@
---
title: Get Started with W&B Models
-weight: 2
+weight: 20
---
Learn when and how to use W&B to track, share, and manage model artifacts in your machine learning workflows. This page covers logging experiments, generating reports, and accessing logged data using the appropriate W&B API for each task.
diff --git a/content/en/guides/quickstart.md b/content/en/guides/quickstart.md
index 5110e5f9b4..fab57a57d8 100644
--- a/content/en/guides/quickstart.md
+++ b/content/en/guides/quickstart.md
@@ -6,7 +6,7 @@ menu:
parent: guides
title: W&B Quickstart
url: quickstart
-weight: 1
+weight: 10
---
Install W&B to track, visualize, and manage machine learning experiments of any size.
diff --git a/content/en/guides/training/_index.md b/content/en/guides/training/_index.md
new file mode 100644
index 0000000000..44c8325d3b
--- /dev/null
+++ b/content/en/guides/training/_index.md
@@ -0,0 +1,18 @@
+---
+menu:
+ default:
+ identifier: training
+title: W&B Training
+description: Post-train your models using reinforcement learning.
+weight: 60
+---
+
+Now in public preview, W&B Training offers serverless reinforcement learning (RL) for post-training large language models (LLMs) to improve their reliability performing multi-turn, agentic tasks while also increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs.
+
+W&B Training includes integration with:
+
+* [ART](https://art.openpipe.ai/getting-started/about), a flexible RL fine-tuning framework.
+* [RULER](https://openpipe.ai/blog/ruler), a universal verifier.
+* A fully-managed backend on [CoreWeave Cloud](https://docs.coreweave.com/docs/platform).
+
+To get started, satisfy the [prerequisites]({{< relref "/guides/training/prerequisites" >}}) to start using the service and then see [OpenPipe's Serverless RL quickstart](https://art.openpipe.ai/getting-started/quick-start) to learn how to post-train your models.
\ No newline at end of file
diff --git a/content/en/guides/training/api-reference.md b/content/en/guides/training/api-reference.md
new file mode 100644
index 0000000000..ac0480a9a8
--- /dev/null
+++ b/content/en/guides/training/api-reference.md
@@ -0,0 +1,8 @@
+---
+title: "API Reference"
+linkTitle: "API Reference"
+weight: 100
+manualLink: "/ref/training"
+description: >
+ Complete API documentation for W&B Training.
+---
\ No newline at end of file
diff --git a/content/en/guides/training/prerequisites.md b/content/en/guides/training/prerequisites.md
new file mode 100644
index 0000000000..1023f1cf98
--- /dev/null
+++ b/content/en/guides/training/prerequisites.md
@@ -0,0 +1,28 @@
+---
+title: "Prerequisites"
+linkTitle: "Prerequisites"
+weight: 1
+description: >
+ Set up your environment to use W&B Training.
+---
+
+Complete these steps before using W&B Training features through the OpenPipe ART framework or API.
+
+{{< alert title="Tip" >}}
+Before starting, review the [usage information and limits]({{< relref "guides/training/serverless-rl/usage-limits" >}}) to understand costs and restrictions.
+{{< /alert >}}
+
+## Sign up and create an API key
+
+To authenticate your machine with W&B, you must first generate an API key at [wandb.ai/authorize](https://wandb.ai/authorize). Copy the API key and store it securely.
+
+## Create a project in W&B
+
+Create a project in your W&B account to track usage, record training metrics, and save trained models. See the [Projects guide](https://docs.wandb.ai/guides/track/project-page) for more information.
+
+## Next steps
+
+After completing the prerequisites:
+
+* Check the [API reference]({{< relref "/ref/training" >}}) to learn about available endpoints
+* Try the [ART quickstart](https://art.openpipe.ai/getting-started/quick-start)
diff --git a/content/en/guides/training/serverless-rl/_index.md b/content/en/guides/training/serverless-rl/_index.md
new file mode 100644
index 0000000000..72e3c4f7a3
--- /dev/null
+++ b/content/en/guides/training/serverless-rl/_index.md
@@ -0,0 +1,37 @@
+---
+menu:
+ default:
+ identifier: serverless-rl
+title: Serverless RL
+description: Learn about how to more efficiently post-train your models using reinforcement learning.
+weight: 5
+---
+
+Now in public preview, Serverless RL helps developers post-train LLMs to learn new behaviors and improve reliability, speed, and costs when performing multi-turn agentic tasks. W&B provision the training infrastructure ([on CoreWeave](https://docs.coreweave.com/docs/platform)) for you while allowing full flexibility in your environment's setup. Serverless RL gives you instant access to a managed training cluster that elastically auto-scales to dozens of GPUs. By splitting RL workflows into inference and training phases and multiplexing them across jobs, Serverless RL increases GPU utilization and reduces your training time and costs.
+
+Serverless RL is ideal for tasks like:
+* Voice agents
+* Deep research assistants
+* On-prem models
+* Content marketing analysis agents
+
+Serverless RL trains low-rank adapters (LoRAs) to specialize a model for your agent's specific task. This extends the original model’s capabilities with on-the-job experience. The LoRAs you train are automatically stored as artifacts in your W&B account, and can be saved locally or to a third party for backup. Models that you train through Serverless RL are also automatically hosted on W&B Inference.
+
+## Why Serverless RL?
+
+Reinforcement learning (RL) is a set of powerful training techniques that you can use in many kinds of training setups, including on GPUs that you own or rent directly. Serverless RL can provide the following advantages in your RL post-training:
+
+* **Lower training costs**: By multiplexing shared infrastructure across many users, skipping the setup process for each job, and scaling your GPU costs down to 0 when you're not actively training, Serverless RL reduces training costs significantly.
+* **Faster training time**: By splitting inference requests across many GPUs and immediately provisioning training infrastructure when you need it, Serverless RL speeds up your training jobs and lets you iterate faster.
+* **Automatic deployment**: Serverless RL automatically deploys every checkpoint you train, eliminating the need to manually set up hosting infrastructure. Trained models can be accessed and tested immediately in local, staging, or production environments.
+
+## How Serverless RL uses W&B services
+
+Serverless RL uses a combination of the following W&B components to operate:
+
+* [Inference]({{< relref "guides/inference" >}}): To run your models
+* [Models]({{< relref "guides/models" >}}): To track performance metrics during the LoRA adapter's training
+* [Artifacts]({{< relref "guides/core/artifacts" >}}): To store and version the LoRA adapters
+* [Weave (optional)]({{< relref "guides/models" >}}): To gain observability into how the model responds at each step of the training loop
+
+Serverless RL is in public preview. During the preview, you are charged only for the use of inference and the storage of artifacts. W&B does not charge for adapter training during the preview period.
\ No newline at end of file
diff --git a/content/en/guides/training/serverless-rl/available-models.md b/content/en/guides/training/serverless-rl/available-models.md
new file mode 100644
index 0000000000..19c66cb460
--- /dev/null
+++ b/content/en/guides/training/serverless-rl/available-models.md
@@ -0,0 +1,17 @@
+---
+title: "Available models"
+linkTitle: "Available models"
+weight: 40
+description: >
+ See the models you can train with Serverless RL.
+---
+
+Serverless RL currently only supports a single open-source foundation model for training.
+
+To express interest in a particular model, contact [support](mailto:support@wandb.ai).
+
+## Model catalog
+
+| Model | Model ID (for API usage) | Type | Context Window | Parameters | Description |
+|-------|--------------------------|------|----------------|------------|-------------|
+| Qwen2.5 14B | Qwen/Qwen2.5-14B-Instruct | Text | 32K | 14B (Active-Total) | Dense model optimized for throughput and quality |
diff --git a/content/en/guides/training/serverless-rl/serverless-rl.md b/content/en/guides/training/serverless-rl/serverless-rl.md
new file mode 100644
index 0000000000..64a5117639
--- /dev/null
+++ b/content/en/guides/training/serverless-rl/serverless-rl.md
@@ -0,0 +1,9 @@
+---
+description: Get started using Serverless RL.
+title: Use Serverless RL
+weight: 10
+---
+
+Serverless RL is supported through [OpenPipe's ART framework](https://art.openpipe.ai/getting-started/about) and the [W&B Training API]({{< relref "ref/training" >}}).
+
+To start using Serverless RL, see the ART [quickstart](https://art.openpipe.ai/getting-started/quick-start) for code examples and workflows. To learn about Serverless RL's API endpoints, see the W&B Training API.
\ No newline at end of file
diff --git a/content/en/guides/training/serverless-rl/usage-limits.md b/content/en/guides/training/serverless-rl/usage-limits.md
new file mode 100644
index 0000000000..1a518868a6
--- /dev/null
+++ b/content/en/guides/training/serverless-rl/usage-limits.md
@@ -0,0 +1,33 @@
+---
+title: "Usage information and limits"
+linkTitle: "Usage & limits"
+weight: 30
+description: >
+ Understand pricing, usage limits, and account restrictions for W&B Serverless RL.
+---
+
+## Pricing
+
+Pricing has three components: inference, training, and storage. For specific billing rates, visit our [pricing page](https://wandb.ai/site/pricing/reinforcement-learning).
+
+### Inference
+
+Pricing for Serverless RL inference requests matches W&B Inference pricing. See [model-specific costs](https://site.wandb.ai/pricing/reinforcement-learning) for more details. Learn more about purchasing credits, account tiers, and usage caps in the [W&B Inference docs]({{< relref "/guides/inference/usage-limits/#purchase-more-credits" >}}).
+
+### Training
+
+At each training step, Serverless RL collects batches of trajectories that include your agent's outputs and associated rewards (calculated by your reward function). The batched trajectories are then used to update the weights of a LoRA adapter that specializes a base model for your task. The training jobs to update these LoRAs run on dedicated GPU clusters managed by Serverless RL.
+
+Training is free during the public preview period.
+
+### Model storage
+
+Serverless RL stores checkpoints of your trained LoRAs so you can evaluate, serve, or continue training them at any time. Storage is billed monthly based on total checkpoint size and your [pricing plan](https://wandb.ai/site/pricing). Every plan includes at least 5GB of free storage, which is enough for roughly 30 LoRAs. We recommend deleting low-performing LoRAs to save space. See the [ART SDK](https://art.openpipe.ai/features/checkpoint-deletion) for instructions on how to do this.
+
+## Limits
+
+* **Inference concurrency limits**: By default, Serverless RL currently supports up to 2000 concurrent requests per user and 6000 per project. If you exceed your rate limit, the Inference API returns a `429 Concurrency limit reached for requests` response. To avoid this error, reduce the number of concurrent requests your training job or production workload makes at once. If you need a higher rate limit, you can request one at support@wandb.com.
+
+* **Personal entities unsupported**: Serverless RL and W&B Inference don't support personal entities (personal accounts). To access Serverless RL, switch to a non-personal account by [creating a Team]({{< relref "/guides/hosting/iam/access-management/manage-organization/#add-and-manage-teams" >}}). Personal entities (personal accounts) were deprecated in May 2024, so this advisory only applies to legacy accounts.
+
+* **Geographic restrictions**: Serverless RL is only available in supported geographic locations. For more information, see the [Terms of Service](https://site.wandb.ai/terms/).
\ No newline at end of file
diff --git a/content/en/guides/training/serverless-rl/use-trained-models.md b/content/en/guides/training/serverless-rl/use-trained-models.md
new file mode 100644
index 0000000000..c87d037697
--- /dev/null
+++ b/content/en/guides/training/serverless-rl/use-trained-models.md
@@ -0,0 +1,78 @@
+---
+title: "Use your trained models"
+linkTitle: "Use trained models"
+weight: 20
+description: Make inference requests to the models you've trained.
+---
+
+After training a model with Serverless RL, it is automatically available for inference.
+
+To send requests to your trained model, you need:
+* Your [W&B API key](https://wandb.ai/authorize)
+* The [Training API's]({{< relref "/ref/training" >}}) base URL, `https://api.training.wandb.ai/v1/`
+* Your model's endpoint
+
+The model's endpoint uses the following schema:
+
+```
+wandb-artifact://///:
+```
+
+The schema consists of:
+
+* Your W&B entity's (team) name
+* The name of the project associated with your model
+* The trained model's name
+* The training step of the model you want to deploy (this is usually the step where the model performed best in your evaluations)
+
+For example, if your W&B team is named `email-specialists`, your project is called `mail-search`, your trained model is named `agent-001`, and you wanted to deploy it on step 25, the endpoint looks like this:
+
+```
+wandb-artifact:///email-specialists/mail-search/agent-001:step25
+```
+
+Once you have your endpoint, you can integrate it into your normal inference workflows. The following examples show how to make inference requests to your trained model using a cURL request or the [Python OpenAI SDK](https://github.com/openai/openai-python).
+
+### cURL
+
+```shell
+curl https://api.training.wandb.ai/v1/chat/completions \
+ -H "Authorization: Bearer $WANDB_API_KEY" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "model": "wandb-artifact://///:",
+ "messages": [
+ {"role": "system", "content": "You are a helpful assistant."},
+ {"role": "user", "content": "Summarize our training run."}
+ ],
+ "temperature": 0.7,
+ "top_p": 0.95
+ }'
+```
+
+### OpenAI SDK
+
+```python
+from openai import OpenAI
+
+WANDB_API_KEY = "your-wandb-api-key"
+ENTITY = "my-entity"
+PROJECT = "my-project"
+
+client = OpenAI(
+ base_url="https://api.training.wandb.ai/v1",
+ api_key=WANDB_API_KEY
+)
+
+response = client.chat.completions.create(
+ model=f"wandb-artifact:///{ENTITY}/{PROJECT}/my-model:step100",
+ messages=[
+ {"role": "system", "content": "You are a helpful assistant."},
+ {"role": "user", "content": "Summarize our training run."},
+ ],
+ temperature=0.7,
+ top_p=0.95,
+)
+
+print(response.choices[0].message.content)
+```
\ No newline at end of file
diff --git a/content/en/guides/weave/_index.md b/content/en/guides/weave/_index.md
index a9b7c782c0..d6d9b73cf9 100644
--- a/content/en/guides/weave/_index.md
+++ b/content/en/guides/weave/_index.md
@@ -3,7 +3,7 @@ menu:
default:
identifier: weave
title: W&B Weave
-weight: 4
+weight: 40
---
{{% alert %}}
diff --git a/content/en/ref/training.md b/content/en/ref/training.md
new file mode 100644
index 0000000000..c9cf6b31cb
--- /dev/null
+++ b/content/en/ref/training.md
@@ -0,0 +1,5 @@
+---
+title: W&B Training API
+description: Generated documentation for W&B APIs
+layout: redoc
+---
diff --git a/layouts/redoc.html b/layouts/redoc.html
new file mode 100644
index 0000000000..5cd9efb2bf
--- /dev/null
+++ b/layouts/redoc.html
@@ -0,0 +1,113 @@
+
+
+
+ {{ partial "head.html" . }}
+
+
+
+
+
+ {{ partial "navbar.html" . }}
+
+
+
+
+
+ {{ partial "version-banner.html" . }}
+ {{ if not (.Param "ui.breadcrumb_disable") -}}
+ {{ partial "breadcrumb.html" . -}}
+ {{ end -}}
+
+
+
+
+
+
+
+
+ {{ partial "footer.html" . }}
+
+ {{ partial "scripts.html" . }}
+
+
+
+
\ No newline at end of file