Skip to content

wandb/skills

Repository files navigation

skills

Evaluate Skills Codex Claude Code

Skills to guide Claude Code, Codex, and other coding agents on using the Weights & Biases AI developer platform to train models and build agents.

For model training

  • Log metrics and rich media during model training and fine-tuning
  • Track model training experiments
  • Analyze runs and experiment results to understand how the model is learning
  • Tune hyperparameters

For agent building

  • Trace agentic AI applications
  • Analyze traces and classify them into failure modes
  • Evaluate models with labeled datasets
  • Run online evaluations for production monitoring

Getting Started

npx skills add wandb/skills

Then set your W&B API key:

export WANDB_API_KEY=<your-key>

npx skills is a utility for installing skills into major coding agent CLIs. Use --global to install for all projects, or --agent <name> to target a specific agent. See the npx skills docs for more details.

Available Skills

Skill Description Status
wandb-primary Comprehensive primary skill for agents working with Weights & Biases. Covers both the W&B and Weave SDK claude-code: 32/35 (91%)

Benchmarks

We maintain a growing internal benchmark suite that evaluates each skill across coding agents and task categories. Skills are evaluated automatically on every merge to main.

Category Tasks Claude Code (sonnet4.6) Codex (gpt-5.3-codex)
Weave analysis 26 97%* 63%*
Weave tooling 11 95%* 83%*
Model training 8 90%* 85%*
LLM finetuning & RL analysis 14 72%* 86%*
Failure & outlier detection 8 86%* 63%*

*Pass rates are +/- 3%. Many tasks span multiple categories.

Contributing

See CONTRIBUTING.md.

About

Official Agent Skills for Weights & Biases Models and Weave

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages