diff --git a/models/tables/evaluate-models.mdx b/models/tables/evaluate-models.mdx index f8f7779dd9..4d712e5148 100644 --- a/models/tables/evaluate-models.mdx +++ b/models/tables/evaluate-models.mdx @@ -1,11 +1,14 @@ --- title: Evaluate models with W&B Weave and W&B Tables description: Learn how to evaluate machine learning models using W&B Weave and Tables. +keywords: [scorers, judges, predictions table, error analysis, model registry] --- +This page shows you two complementary ways to evaluate models tracked in W&B: use W&B Weave for LLM and GenAI evaluations, and use W&B Tables for prediction analysis across runs and epochs. + ## Evaluate models with Weave -[W&B Weave](/weave) is a purpose-built toolkit for evaluating LLMs and GenAI applications. It provides comprehensive evaluation capabilities including scorers, judges, and detailed tracing to help you understand and improve model performance. Weave integrates with W&B Models, allowing you to evaluate models stored in your Model Registry. +[W&B Weave](/weave) is a purpose-built toolkit for evaluating LLMs and GenAI applications. It provides evaluation capabilities including scorers, judges, and detailed tracing to help you understand and improve model performance. Weave integrates with W&B Models so you can evaluate models stored in your Model Registry. Weave evaluation dashboard showing model performance metrics and traces @@ -13,13 +16,15 @@ description: Learn how to evaluate machine learning models using W&B Weave and T ### Key features for model evaluation -* **Scorers and judges**: Pre-built and custom evaluation metrics for accuracy, relevance, coherence, and more -* **Evaluation datasets**: Structured test sets with ground truth for systematic evaluation -* **Model versioning**: Track and compare different versions of your models -* **Detailed tracing**: Debug model behavior with complete input/output traces -* **Cost tracking**: Monitor API costs and token usage across evaluations +Weave provides the following capabilities for model evaluation: + +* **Scorers and judges**: Pre-built and custom evaluation metrics for accuracy, relevance, coherence, and more. +* **Evaluation datasets**: Structured test sets with ground truth for systematic evaluation. +* **Model versioning**: Track and compare different versions of your models. +* **Detailed tracing**: Debug model behavior with complete input/output traces. +* **Cost tracking**: Monitor API costs and token usage across evaluations. -### Getting started: Evaluate a model from W&B Registry +### Evaluate a model from W&B Registry Download a model from W&B Models Registry and evaluate it using Weave: @@ -69,12 +74,14 @@ results = await evaluation.evaluate(model) ### Integrate Weave evaluations with W&B Models +To connect Weave evaluation results with the models and runs you track in W&B, use the integration workflow described next. + The [Models and Weave Integration Demo](/weave/cookbooks/Models_and_Weave_Integration_Demo) shows the complete workflow for: -1. **Load models from Registry**: Download fine-tuned models stored in W&B Models Registry -2. **Create evaluation pipelines**: Build comprehensive evaluations with custom scorers -3. **Log results back to W&B**: Connect evaluation metrics to your model runs -4. **Version evaluated models**: Save improved models back to the Registry +1. **Load models from Registry**: Download fine-tuned models stored in W&B Models Registry. +2. **Create evaluation pipelines**: Build evaluations with custom scorers. +3. **Log results back to W&B**: Connect evaluation metrics to your model runs. +4. **Version evaluated models**: Save improved models back to the Registry. Log evaluation results to both Weave and W&B Models: @@ -93,17 +100,19 @@ wandb.run.config.update({ ### Advanced Weave features #### Custom scorers and judges -Create sophisticated evaluation metrics tailored to your use case: + +Create evaluation metrics tailored to your use case: ```python @weave.op() -def llm_judge_scorer(expected: str, output: str, judge_model) -> dict: +async def llm_judge_scorer(expected: str, output: str, judge_model) -> dict: prompt = f"Is this answer correct? Expected: {expected}, Got: {output}" judgment = await judge_model.predict(prompt) return {"judge_score": judgment} ``` #### Batch evaluations + Evaluate multiple model versions or configurations: ```python @@ -119,18 +128,21 @@ for model in models: ### Next steps +For more information, see the following: + * [Complete Weave evaluation tutorial](/weave/tutorial-eval/) * [Models and Weave integration example](/weave/cookbooks/Models_and_Weave_Integration_Demo) -## Evaluate models with tables +## Evaluate models with Tables -Use W&B Tables to: -* **Compare model predictions**: View side-by-side comparisons of how different models perform on the same test set -* **Track prediction changes**: Monitor how predictions evolve across training epochs or model versions -* **Analyze errors**: Filter and query to find commonly misclassified examples and error patterns -* **Visualize rich media**: Display images, audio, text, and other media types alongside predictions and metrics +W&B Tables let you log structured predictions and inspect them interactively in the UI. Use W&B Tables to: + +* **Compare model predictions**: View side-by-side comparisons of how different models perform on the same test set. +* **Track prediction changes**: Monitor how predictions evolve across training epochs or model versions. +* **Analyze errors**: Filter and query to find commonly misclassified examples and error patterns. +* **Visualize rich media**: Display images, audio, text, and other media types alongside predictions and metrics. ![Example of predictions table showing model outputs alongside ground truth labels](/images/data_vis/tables_sample_predictions.png) @@ -170,6 +182,7 @@ run.log({"evaluation_results": eval_table}) ### Advanced table workflows #### Compare multiple models + Log evaluation tables from different models to the same key for direct comparison: ```python @@ -189,6 +202,7 @@ with wandb.init(project="model-comparison", name="model_b") as run: #### Track predictions over time + Log tables at different training epochs to visualize improvement: ```python @@ -206,11 +220,12 @@ for epoch in range(num_epochs): ### Interactive analysis in the W&B UI -Once logged, you can: -1. **Filter results**: Click on column headers to filter by prediction accuracy, confidence thresholds, or specific classes -2. **Compare tables**: Select multiple table versions to see side-by-side comparisons -3. **Query data**: Use the query bar to find specific patterns (for example, `"correct" = false AND "confidence" > 0.8`) -4. **Group and aggregate**: Group by predicted class to see per-class accuracy metrics +After you log your tables, the W&B UI provides several ways to explore the results. You can: + +* **Filter results**: Click column headers to filter by prediction accuracy, confidence thresholds, or specific classes. +* **Compare tables**: Select multiple table versions to see side-by-side comparisons. +* **Query data**: Use the query bar to find specific patterns (for example, `"correct" = false AND "confidence" > 0.8`). +* **Group and aggregate**: Group by predicted class to see per-class accuracy metrics. ![Interactive filtering and querying of evaluation results in W&B Tables](/images/data_vis/wandb_demo_filter_on_a_table.png) @@ -218,6 +233,8 @@ Once logged, you can: ### Example: Error analysis with enriched tables +The following example creates a mutable table, logs initial predictions, then adds confidence and error type columns for deeper analysis: + ```python # Create a mutable table to add analysis columns eval_table = wandb.Table( diff --git a/models/tables/log_tables.mdx b/models/tables/log_tables.mdx index 6f98eeb2df..621ddeda6b 100644 --- a/models/tables/log_tables.mdx +++ b/models/tables/log_tables.mdx @@ -1,14 +1,17 @@ --- title: Log tables description: "Create and log W&B Tables with different logging modes including immutable, mutable, and incremental using the Python SDK." +keywords: ["wandb.Table", "log_mode", "add_column", "add_data", "batch logging"] --- -Visualize and log tabular data with W&B Tables. A W&B Table is a two-dimensional grid of data where each column has a single type of data. Each row represents one or more data points logged to a W&B [run](/models/runs/). W&B Tables support primitive and numeric types, as well as nested lists, dictionaries, and rich media types. +This page shows you how to create and log W&B Tables with the Python SDK so you can visualize and analyze tabular data, including predictions, evaluation results, and batched training output, alongside your ML experiments. + +A W&B Table is a two-dimensional grid of data where each column has a single type of data. Each row represents one or more data points logged to a W&B [run](/models/runs/). W&B Tables support primitive and numeric types, as well as nested lists, dictionaries, and rich media types. A W&B Table is a specialized [data type](/models/ref/python/data-types/) in W&B, logged as an [artifact](/models/artifacts/) object. -You [create and log table objects](#create-and-log-a-new-table) using the W&B Python SDK. When you create a table object, you specify the columns and data for the table and a [mode](#table-logging-modes). The mode determines how the table is logged and updated during your ML experiments. +You [create and log table objects](#create-and-log-a-table) using the W&B Python SDK. When you create a table object, you specify the columns and data for the table and a [mode](#logging-modes). The mode determines how the table is logged and updated during your ML experiments, which affects performance, what you can change after logging, and how the table appears in the W&B App. `INCREMENTAL` mode is supported on W&B Server v0.70.0 and above. @@ -16,8 +19,10 @@ You [create and log table objects](#create-and-log-a-new-table) using the W&B Py ## Create and log a table -1. Initialize a new run with `wandb.init()`. -2. Create a Table object with the [`wandb.Table`](/models/ref/python/data-types/table) Class. Specify the columns and data for the table for the `columns` and `data` parameters, respectively. It is recommended to set the optional `log_mode` parameter to one of the three modes: `IMMUTABLE` (the default), `MUTABLE`, or `INCREMENTAL`. See [Table Logging Modes](#logging-modes) in the next section for more information. +Follow these steps to log a table to a run. The resulting table is stored as an artifact in W&B and rendered in the run's workspace. + +1. Initialize a new run with `wandb.init()`. +2. Create a Table object with the [`wandb.Table`](/models/ref/python/data-types/table) class. Specify the columns and data for the table for the `columns` and `data` parameters, respectively. Set the optional `log_mode` parameter to one of the three modes: `IMMUTABLE` (the default), `MUTABLE`, or `INCREMENTAL`, because the mode controls how the table behaves when logged. See [Logging modes](#logging-modes) for more information. 3. Log the table to W&B with `run.log()`. The following example shows how to create and log a table with two columns, `a` and `b`, and two rows of data, `["a1", "b1"]` and `["a2", "b2"]`: @@ -41,21 +46,23 @@ with wandb.init(project="table-demo") as run: ## Logging modes +The logging mode you choose affects performance, what you can change after logging, and how the table appears in the W&B App. Pick the mode that matches your workflow, for example, a single end-of-run snapshot, a table you progressively enrich with new columns, or a long-running training table updated in batches. + The [`wandb.Table`](/models/ref/python/data-types/table) `log_mode` parameter determines how a table is logged and updated during your ML experiments. The `log_mode` parameter accepts one of three arguments: `IMMUTABLE`, `MUTABLE`, and `INCREMENTAL`. Each mode has different implications for how a table is logged, how it can be modified, and how it is rendered in the W&B App. -The following describes the three logging modes, the high-level differences, and common use case for each mode: +The following describes the three logging modes, the high-level differences, and common use cases for each mode: -| Mode | Definition | Use Cases | Benefits | +| Mode | Definition | Use cases | Benefits | | ----- | ---------- | ---------- | ----------| | `IMMUTABLE` | Once a table is logged to W&B, you cannot modify it. |- Storing tabular data generated at the end of a run for further analysis | - Minimal overhead when logged at the end of a run
- All rows rendered in UI | -| `MUTABLE` | After you log a table to W&B, you can overwrite the existing table with a new one. | - Adding columns or rows to existing tables
- Enriching results with new information | - Capture Table mutations
- All rows rendered in UI | +| `MUTABLE` | After you log a table to W&B, you can overwrite the existing table with a new one. | - Adding columns or rows to existing tables
- Enriching results with new information | - Capture table mutations
- All rows rendered in UI | | `INCREMENTAL` | Add batches of new rows to a table throughout the machine learning experiment. | - Adding rows to tables in batches
- Long-running training jobs
- Processing large datasets in batches
- Monitoring ongoing results | - View updates on UI during training
- Ability to step through increments | -The next sections show example code snippets for each mode along with considerations when to use each mode. +The following sections show example code snippets for each mode along with considerations for when to use each mode. -### MUTABLE mode +### `MUTABLE` mode -`MUTABLE` mode updates an existing table by replacing the existing table with a new one. `MUTABLE` mode is useful when you want to add new columns and rows to an existing table in a non iterative process. Within the UI, the table is rendered with all rows and columns, including the new ones added after the initial log. +`MUTABLE` mode updates an existing table by replacing it with a new one. `MUTABLE` mode is useful when you want to add new columns and rows to an existing table in a non-iterative process. Within the UI, the table is rendered with all rows and columns, including the new ones added after the initial log. In `MUTABLE` mode, the table object is replaced each time you log the table. Overwriting a table with a new one is computationally expensive and can be slow for large tables. @@ -64,7 +71,7 @@ In `MUTABLE` mode, the table object is replaced each time you log the table. Ove The following example shows how to create a table in `MUTABLE` mode, log it, and then add new columns to it. The table object is logged three times: once with the initial data, once with the confidence scores, and once with the final predictions. -The following example uses a placeholder function `load_eval_data()` to load data and a placeholder function `model.predict()` to make predictions. You will need to replace these with your own data loading and prediction functions. +The following example uses a placeholder function `load_eval_data()` to load data and a placeholder function `model.predict()` to make predictions. Replace these with your own data loading and prediction functions. ```python @@ -84,7 +91,7 @@ with wandb.init(project="mutable-table-demo") as run: for inp, label, pred in zip(inputs, labels, raw_preds): table.add_data(inp, label, pred) - # Step 1: Log initial data + # Step 1: Log initial data run.log({"eval_table": table}) # Log initial table # Step 2: Add confidence scores (e.g. max softmax) @@ -99,20 +106,20 @@ with wandb.init(project="mutable-table-demo") as run: run.log({"eval_table": table}) # Final update with another column ``` -If you only want to add new batches of rows (no columns) incrementally like in a training loop, consider using [`INCREMENTAL` mode](#INCREMENTAL-mode) instead. +If you only want to add new batches of rows (no columns) incrementally like in a training loop, consider using [`INCREMENTAL` mode](#incremental-mode) instead. -### INCREMENTAL mode +### `INCREMENTAL` mode -In incremental mode, you log batches of rows to a table during the machine learning experiment. This is ideal for monitoring long-running jobs or when working with large tables that would be inefficient to log during the run for updates. Within the UI, the table is updated with new rows as they are logged, allowing you to view the latest data without having to wait for the entire run to finish. You can also step through the increments to view the table at different points in time. +In `INCREMENTAL` mode, you log batches of rows to a table during the machine learning experiment. This is ideal for monitoring long-running jobs or when working with large tables that would be inefficient to log during the run for updates. Within the UI, the table is updated with new rows as they are logged, so you can view the latest data without waiting for the entire run to finish. You can also step through the increments to view the table at different points in time. Run workspaces in the W&B App have a limit of 100 increments. If you log more than 100 increments, only the most recent 100 are shown in the run workspace. -The following example creates a table in `INCREMENTAL` mode, logs it, and then adds new rows to it. Note that the table is logged once per training step (`step`). +The following example creates a table in `INCREMENTAL` mode, logs it, and then adds new rows to it. The table is logged once per training step (`step`). -The following example uses a placeholder function `get_training_batch()` to load data, a placeholder function `train_model_on_batch()` to train the model, and a placeholder function `predict_on_batch()` to make predictions. You will need to replace these with your own data loading, training, and prediction functions. +The following example uses a placeholder function `get_training_batch()` to load data, a placeholder function `train_model_on_batch()` to train the model, and a placeholder function `predict_on_batch()` to make predictions. Replace these with your own data loading, training, and prediction functions. ```python @@ -141,7 +148,7 @@ with wandb.init(project="incremental-table-demo") as run: run.log({"training_table": table}, step=step) ``` -Incremental logging is generally more computationally efficient than logging a new table each time (`log_mode=MUTABLE`). However, the W&B App may not render all rows in the table if you log a large number of increments. If your goal is to update and view your table data while your run is ongoing and to have all the data available for analysis, consider using two tables. One with `INCREMENTAL` log mode and one with `IMMUTABLE` log mode. +Incremental logging is more computationally efficient than logging a new table each time (`log_mode="MUTABLE"`). However, the W&B App may not render all rows in the table if you log a large number of increments. To update and view your table data while your run is ongoing and to have all the data available for analysis, consider using two tables: one with `INCREMENTAL` log mode and one with `IMMUTABLE` log mode. The following example shows how to combine `INCREMENTAL` and `IMMUTABLE` logging modes to achieve this. @@ -173,11 +180,13 @@ with wandb.init(project="combined-logging-example") as run: run.log({"table": final_table}) ``` -In this example, the `incr_table` is logged incrementally (with `log_mode="INCREMENTAL"`) during training. This allows you to log and view updates to the table as new data is processed. At the end of training, an immutable table (`final_table`) is created with all data from the incremental table. The immutable table is logged to preserve the complete dataset for further analysis and it enables you to view all rows in the W&B App. +In this example, the `incr_table` is logged incrementally (with `log_mode="INCREMENTAL"`) during training. This lets you log and view updates to the table as new data is processed. At the end of training, an immutable table (`final_table`) is created with all data from the incremental table. The immutable table is logged to preserve the complete dataset for further analysis and to let you view all rows in the W&B App. + +## Examples -## Examples +The following examples show end-to-end logging patterns for common workflows: enriching evaluation results, resuming runs that log incremental tables, and incrementally logging batches during training. -### Enriching evaluation results with MUTABLE +### Enrich evaluation results with `MUTABLE` ```python import wandb @@ -207,7 +216,7 @@ with wandb.init(project="mutable-logging") as run: run.log({"eval_table": table}) ``` -### Resuming runs with INCREMENTAL tables +### Resume runs with `INCREMENTAL` tables You can continue logging to an incremental table when resuming a run: @@ -216,7 +225,7 @@ You can continue logging to an incremental table when resuming a run: resumed_run = wandb.init(project="resume-incremental", id="your-run-id", resume="must") # Create the incremental table; no need to populate with data from previously logged table -# Increments will be continue to be added to the Table artifact. +# Increments continue to be added to the Table artifact. table = wandb.Table(columns=["step", "metric"], log_mode="INCREMENTAL") # Continue logging @@ -233,7 +242,7 @@ Increments are logged to a new table if you turn off summaries on a key used for -### Training with INCREMENTAL batch training +### Train in batches with `INCREMENTAL` ```python diff --git a/models/tables/tables-download.mdx b/models/tables/tables-download.mdx index 6900e9eabe..cdb69c5637 100644 --- a/models/tables/tables-download.mdx +++ b/models/tables/tables-download.mdx @@ -1,12 +1,14 @@ --- description: "Export W&B Table data to pandas DataFrames and CSV files for offline analysis and data processing." title: Export table data +keywords: ["pandas DataFrame", "CSV export", "get_dataframe", "table to artifact"] --- -Like all W&B Artifacts, Tables can be converted into pandas dataframes for easy data exporting. +Like all W&B artifacts, you can convert Tables into pandas DataFrames for data export. This page shows you how to export the data in a W&B Table to a pandas DataFrame and then to a CSV file, so that you can analyze or process the data outside of W&B. -## Convert `table` to `artifact` -First, you'll need to convert the table to an artifact. The easiest way to do this using `artifact.get(table, "table_name")`: +## Convert a table to an artifact + +First, add the table to an artifact and log it, then retrieve the table as a `Table` object you can manipulate. Use `artifact.add(table, "my_table")` to add the table to the artifact and `artifact.get("my_table")` to retrieve it later: ```python # Create and log a new table. @@ -24,8 +26,9 @@ with wandb.init() as r: table = artifact.get("my_table") ``` -## Convert `artifact` to Dataframe -Then, convert the table into a dataframe: +## Convert the artifact to a DataFrame + +With the table retrieved from the artifact, convert it into a pandas DataFrame so that you can use DataFrame operations: ```python # Following from the last code example: @@ -33,14 +36,18 @@ df = table.get_dataframe() ``` ## Export data -Now you can export using any method dataframe supports: + +With your data in a DataFrame, you can export it using any method that pandas supports. For example, the following exports the data to a CSV file: ```python -# Converting the table data to .csv +# Convert the table data to .csv df.to_csv("example.csv", encoding="utf-8") ``` -# Next Steps -- Check out the [reference documentation](/models/artifacts/construct-an-artifact/) on `artifacts`. -- Go through our [Tables Walkthrough](/models/tables/tables-walkthrough/) guide. -- Check out the [Dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) reference docs. \ No newline at end of file +## Next steps + +For more information, see the following resources: + +- [Construct an artifact](/models/artifacts/construct-an-artifact/) for reference documentation on artifacts. +- [Tables walkthrough](/models/tables/tables-walkthrough/) for a guided tutorial. +- [pandas DataFrame reference](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) for DataFrame API documentation. \ No newline at end of file diff --git a/models/tables/tables-gallery.mdx b/models/tables/tables-gallery.mdx index 4dbd8867d3..00a55816a3 100644 --- a/models/tables/tables-gallery.mdx +++ b/models/tables/tables-gallery.mdx @@ -1,13 +1,18 @@ --- description: "Explore example W&B Tables projects for image classification, audio, text analysis, and other use cases." title: Example tables +keywords: [data visualization, image classification, semantic segmentation, model comparison] --- +This page showcases example W&B Tables projects. The examples illustrate what you can build with Tables and how teams use them for different data types and workflows. Use them as inspiration for your own projects or as starting points for exploring Tables features. + +## Ways to use tables + The following sections highlight some of the ways you can use tables: ### View your data -Log metrics and rich media during model training or evaluation, then visualize results in a persistent database synced to the cloud, or to your [hosting instance](/platform/hosting). +Log metrics and rich media during model training or evaluation, then visualize results in a persistent database synced to the cloud, or to your [hosting instance](/platform/hosting). Data browsing table @@ -15,9 +20,9 @@ Log metrics and rich media during model training or evaluation, then visualize r For example, check out this table that shows a [balanced split of a photos dataset](https://wandb.ai/stacey/mendeleev/artifacts/balanced_data/inat_80-10-10_5K/ab79f01e007113280018/files/data_split.table.json). -### Interactively explore your data +### Explore your data interactively -View, sort, filter, group, join, and query tables to understand your data and model performance—no need to browse static files or rerun analysis scripts. +View, sort, filter, group, join, and query tables to understand your data and model performance. You don't need to browse static files or rerun analysis scripts. Audio comparison @@ -27,7 +32,7 @@ For example, see this report on [style-transferred audio](https://wandb.ai/stace ### Compare model versions -Quickly compare results across different training epochs, datasets, hyperparameter choices, model architectures etc. +Compare results across training epochs, datasets, hyperparameter choices, and model architectures. Model comparison @@ -43,13 +48,15 @@ Zoom in to visualize a specific prediction at a specific step. Zoom out to see t Tracking experiment details -For example, see this example table that analyzes results [after one and then after five epochs on the MNIST dataset](https://wandb.ai/stacey/mnist-viz/artifacts/predictions/baseline/d888bc05719667811b23/files/predictions.table.json#7dd0cd845c0edb469dec). -## Example Projects with W&B Tables -The following highlight some real W&B Projects that use W&B Tables. +For example, see this table that analyzes results [after one and then after five epochs on the MNIST dataset](https://wandb.ai/stacey/mnist-viz/artifacts/predictions/baseline/d888bc05719667811b23/files/predictions.table.json#7dd0cd845c0edb469dec). + +## Example projects with W&B Tables + +The following sections highlight real W&B projects that use Tables, organized by data type and use case. ### Image classification -Read [Visualize Data for Image Classification](https://wandb.ai/stacey/mendeleev/reports/Visualize-Data-for-Image-Classification--VmlldzozNjE3NjA), follow the [data visualization nature Colab](https://wandb.me/dsviz-nature-colab), or explore the [artifacts context](https://wandb.ai/stacey/mendeleev/artifacts/val_epoch_preds/val_pred_gawf9z8j/2dcee8fa22863317472b/files/val_epoch_res.table.json) to see how a CNN identifies ten types of living things (plants, bird, insects, etc) from [iNaturalist](https://www.inaturalist.org/pages/developers) photos. +See how a CNN identifies ten types of living things (plants, birds, insects, and more) from [iNaturalist](https://www.inaturalist.org/pages/developers) photos. Read [Visualize Data for Image Classification](https://wandb.ai/stacey/mendeleev/reports/Visualize-Data-for-Image-Classification--VmlldzozNjE3NjA), follow the [data visualization nature Colab](https://wandb.me/dsviz-nature-colab), or explore the [artifacts context](https://wandb.ai/stacey/mendeleev/artifacts/val_epoch_preds/val_pred_gawf9z8j/2dcee8fa22863317472b/files/val_epoch_res.table.json). Compare the distribution of true labels across two different models predictions. @@ -73,28 +80,28 @@ Browse text samples from training data or generated output, dynamically group by ### Video -Browse and aggregate over videos logged during training to understand your models. Here is an early example using the [SafeLife benchmark](https://wandb.ai/safelife/v1dot2/benchmark) for RL agents seeking to [minimize side effects ](https://wandb.ai/stacey/saferlife/artifacts/video/videos_append-spawn/c1f92c6e27fa0725c154/files/video_examples.table.json) +Browse and aggregate over videos logged during training to understand your models. For an example, see the [SafeLife benchmark](https://wandb.ai/safelife/v1dot2/benchmark) for reinforcement learning (RL) agents seeking to [minimize side effects](https://wandb.ai/stacey/saferlife/artifacts/video/videos_append-spawn/c1f92c6e27fa0725c154/files/video_examples.table.json). - Browse easily through the few successful agents + Browse through the few successful agents ### Tabular data -View a report on how to [split and pre-process tabular data](https://wandb.ai/dpaiton/splitting-tabular-data/reports/Tabular-Data-Versioning-and-Deduplication-with-Weights-Biases--VmlldzoxNDIzOTA1) with version control and de-duplication. +View a report on how to [split and preprocess tabular data](https://wandb.ai/dpaiton/splitting-tabular-data/reports/Tabular-Data-Versioning-and-Deduplication-with-Weights-Biases--VmlldzoxNDIzOTA1) with version control and deduplication. Tables and Artifacts workflow -### Comparing model variants (semantic segmentation) +### Compare model variants (semantic segmentation) -An [interactive notebook](https://wandb.me/dsviz-cars-demo) and [live example](https://wandb.ai/stacey/evalserver_answers_2/artifacts/results/eval_Daenerys/c2290abd3d7274f00ad8/files/eval_results.table.json#a57f8e412329727038c2$eval_Ada) of logging Tables for semantic segmentation and comparing different models. Try your own queries [in this Table](https://wandb.ai/stacey/evalserver_answers_2/artifacts/results/eval_Daenerys/c2290abd3d7274f00ad8/files/eval_results.table.json). +See an [interactive notebook](https://wandb.me/dsviz-cars-demo) and [live example](https://wandb.ai/stacey/evalserver_answers_2/artifacts/results/eval_Daenerys/c2290abd3d7274f00ad8/files/eval_results.table.json#a57f8e412329727038c2$eval_Ada) that log Tables for semantic segmentation and compare different models. Try your own queries [in this Table](https://wandb.ai/stacey/evalserver_answers_2/artifacts/results/eval_Daenerys/c2290abd3d7274f00ad8/files/eval_results.table.json). Find the best predictions across two models on the same test set -### Analyzing improvement over training time +### Analyze improvement over training time -A detailed report on how to [visualize predictions over time](https://wandb.ai/stacey/mnist-viz/reports/Visualize-Predictions-over-Time--Vmlldzo1OTQxMTk) and the accompanying [interactive notebook](https://wandb.me/dsviz-mnist-colab). \ No newline at end of file +Read the detailed report on how to [visualize predictions over time](https://wandb.ai/stacey/mnist-viz/reports/Visualize-Predictions-over-Time--Vmlldzo1OTQxMTk), and explore the accompanying [interactive notebook](https://wandb.me/dsviz-mnist-colab). \ No newline at end of file diff --git a/models/tables/tables-walkthrough.mdx b/models/tables/tables-walkthrough.mdx index bd69cd5b27..1edfcd731f 100644 --- a/models/tables/tables-walkthrough.mdx +++ b/models/tables/tables-walkthrough.mdx @@ -1,18 +1,20 @@ --- -description: Explore how to use W&B Tables with this 5 minute Quickstart. -title: 'Tutorial: Log tables, visualize and query data' +description: Explore how to use W&B Tables with this 5-minute Quickstart. +title: 'Tutorial: Log tables, visualize, and query data' +keywords: ["wandb.Table", "Pandas DataFrame", "cross-run comparison", "table visualization"] --- -The following Quickstart demonstrates how to log data tables, visualize data, and query data. +This Quickstart walks you through logging data tables to W&B, visualizing them in your project workspace, and comparing results across runs. By the end, you have a logged table that you can explore and compare in the W&B App. -Select the button below to try a PyTorch Quickstart example project on MNIST data. +Select the button below to try a PyTorch Quickstart example project on MNIST data. -## 1. Log a table -Log a table with W&B. You can either construct a new table or pass a Pandas Dataframe. +## Log a table + +In this step, you create a table and log it to W&B so that it's available for visualization later in the walkthrough. You can either construct a new table or pass a Pandas DataFrame. -To construct and log a new Table, you will use: +To construct and log a new table, use the following: - [`wandb.init()`](/models/ref/python/functions/init): Create a [run](/models/runs/) to track results. - [`wandb.Table()`](/models/ref/python/data-types/table): Create a new table object. - `columns`: Set the column names. @@ -29,8 +31,8 @@ with wandb.init(project="table-test") as run: run.log({"Table Name": my_table}) ``` - -Pass a Pandas Dataframe to `wandb.Table()` to create a new table. + +Pass a Pandas DataFrame to `wandb.Table()` to create a new table. ```python import wandb @@ -50,29 +52,31 @@ For more information on supported data types, see the [`wandb.Table`](/models/re -## 2. Visualize tables in your project workspace +## Visualize tables in your project workspace -View the resulting table in your workspace. +After logging a table, you can view it in the W&B App to confirm W&B recorded it correctly and to explore its contents. 1. Navigate to your project in the W&B App. -2. Select the name of your run in your project workspace. A new panel is added for each unique table key. +2. Select the name of your run in your project workspace. W&B adds a new panel for each unique table key. - Sample table logged + Sample table logged -In this example, `my_table`, is logged under the key `"Table Name"`. +In this example, `my_table` is logged under the key `"Table Name"`. + +## Compare across model versions -## 3. Compare across model versions +After you have logged tables from more than one run, you can use the project workspace to compare results side by side and evaluate how model versions differ. -Log sample tables from multiple W&B Runs and compare results in the project workspace. In this [example workspace](https://wandb.ai/carey/table-test?workspace=user-carey), we show how to combine rows from multiple different versions in the same table. +Log sample tables from multiple W&B Runs and compare results in the project workspace. This [example workspace](https://wandb.ai/carey/table-test?workspace=user-carey) shows how to combine rows from multiple different versions in the same table. - Cross-run table comparison + Cross-run table comparison Use the table filter, sort, and grouping features to explore and evaluate model results. - Table filtering + Table filtering \ No newline at end of file diff --git a/models/tables/visualize-tables.mdx b/models/tables/visualize-tables.mdx index 038926ccc8..8a8d814469 100644 --- a/models/tables/visualize-tables.mdx +++ b/models/tables/visualize-tables.mdx @@ -1,20 +1,22 @@ --- description: "Compare, filter, group, sort, and visualize W&B Tables data in merged or side-by-side views for analysis." title: Visualize and analyze tables +keywords: ["merged view", "side-by-side view", "step slider", "query panel", "artifact comparison"] --- -Customize your W&B Tables to answer questions about your machine learning model's performance, analyze your data, and more. +Customize your W&B Tables to answer questions about your machine learning model's performance, analyze your data, and more. This page is for machine learning practitioners who want to inspect logged data interactively in the W&B App. Interactively explore your data to: * [Compare two W&B Tables](#table-comparison-options) logged as artifact versions to analyze changes in your data or model performance. -* Understand higher-level patterns in your data +* Understand higher-level patterns in your data. * [View how values you log to a table change throughout your runs](#visualize-how-values-change-throughout-your-runs). -W&B Tables posses the following behaviors: -1. **Stateless in an artifact context**: any table logged alongside an artifact version resets to its default state after you close the browser window -2. **Stateful in a workspace or report context**: any changes you make to a table in a single run workspace, multi-run project workspace, or Report persists. +W&B Tables have the following behaviors: + +- **Stateless in an artifact context**: any table logged alongside an artifact version resets to its default state after you close the browser window. +- **Stateful in a workspace or report context**: any changes you make to a table in a single run workspace, multi-run project workspace, or report persist. For information on how to save your current W&B Table view, see [Save your view](#save-your-view). @@ -23,7 +25,7 @@ For information on how to save your current W&B Table view, see [Save your view] ## Table comparison options -Compare two tables with a [merged view](#merged-view) or a [side-by-side view](#side-by-side-view). For example, the image below demonstrates a table comparison of MNIST data. +This section describes how to start a comparison and the two views you can switch between. Compare two tables with a [merged view](#merged-view) or a [side-by-side view](#side-by-side-view) to spot changes in your data or model output. For example, the following image demonstrates a table comparison of MNIST data. Training epoch comparison @@ -33,22 +35,21 @@ Follow these steps to compare two tables: 1. Go to your project in the W&B App. 2. Select the artifacts icon in the project sidebar. -2. Select an artifact version. - -In the following image we demonstrate a model's predictions on MNIST validation data after each of five epochs ([view interactive example here](https://wandb.ai/stacey/mnist-viz/artifacts/predictions/baseline/d888bc05719667811b23/files/predictions.table.json)). +3. Select an artifact version. - - Click on 'predictions' to view the Table - + The following image shows a model's predictions on MNIST validation data after each of five epochs ([view interactive example here](https://wandb.ai/stacey/mnist-viz/artifacts/predictions/baseline/d888bc05719667811b23/files/predictions.table.json)). + + Click 'predictions' to view the table + -3. Hover over the second artifact version you want to compare in the sidebar and click **Compare** when it appears. For example, in the image below we select a version labeled as "v4" to compare to MNIST predictions made by the same model after 5 epochs of training. +4. Hover over the second artifact version you want to compare in the sidebar and click **Compare** when it appears. For example, in the following image, you select a version labeled as "v4" to compare to MNIST predictions made by the same model after 5 epochs of training. Model prediction comparison - +After you select a second artifact version, W&B opens a comparison of the two tables that you can explore in either of the following views. ### Merged view {/* To do, add steps */} @@ -58,31 +59,34 @@ Initially you see both tables merged together. The first table selected has inde Merged view -From the merged view, you can +From the merged view, you can: -* **choose the join key**: use the dropdown at the top left to set the column to use as the join key for the two tables. Typically this is the unique identifier of each row, such as the filename of a specific example in your dataset or an incrementing index on your generated samples. Note that it's currently possible to select _any_ column, which may yield illegible tables and slow queries. -* **concatenate instead of join**: select "concatenating all tables" in this dropdown to _union all the rows_ from both tables into one larger Table instead of joining across their columns -* **reference each Table explicitly**: use 0, 1, and \* in the filter expression to explicitly specify a column in one or both table instances -* **visualize detailed numerical differences as histograms**: compare the values in any cell at a glance +* **Choose the join key**: Use the dropdown at the top left to set the column to use as the join key for the two tables. Typically this is the unique identifier of each row, such as the filename of a specific example in your dataset or an incrementing index on your generated samples. You can select any column, but some choices may yield illegible tables and slow queries. +* **Concatenate instead of join**: Select "concatenating all tables" in this dropdown to union all the rows from both tables into one larger table instead of joining across their columns. +* **Reference each table explicitly**: Use `0`, `1`, and `*` in the filter expression to explicitly specify a column in one or both table instances. +* **Visualize detailed numerical differences as histograms**: Compare the values in any cell at a glance. ### Side-by-side view {/* To do */} -To view the two tables side-by-side, change the first dropdown from "Merge Tables: Table" to "List of: Table" and then update the "Page size" respectively. Here the first Table selected is on the left and the second one is on the right. Also, you can compare these tables vertically as well by clicking on the "Vertical" checkbox. +To view the two tables side-by-side, change the first dropdown from **Merge Tables: Table** to **List of: Table** and then update the **Page size**. Here the first table selected is on the left and the second one is on the right. You can also compare these tables vertically by clicking the **Vertical** checkbox. Side-by-side table view -* **compare the tables at a glance**: apply any operations (sort, filter, group) to both tables in tandem and spot any changes or differences quickly. For example, view the incorrect predictions grouped by guess, the hardest negatives overall, the confidence score distribution by true label, etc. -* **explore two tables independently**: scroll through and focus on the side/rows of interest +From the side-by-side view, you can: + +* **Compare the tables at a glance**: Apply any operations (sort, filter, group) to both tables in tandem to spot changes or differences. For example, view the incorrect predictions grouped by guess, the hardest negatives overall, or the confidence score distribution by true label. +* **Explore two tables independently**: Scroll through and focus on the side or rows of interest. ## Compare artifacts -Compare two W&B Tables logged as artifact versions to analyze changes in your data or model performance. Use the merged view or side-by-side view to compare the tables. +Compare two W&B Tables logged as artifact versions to analyze changes in your data or model performance. Use the merged view or side-by-side view to compare the tables. The following sections describe two common comparison scenarios: comparing the same model over time, and comparing different model variants. ### Compare tables across time + Log a table in an artifact for each meaningful step of training to analyze model performance over training time. For example, you could log a table at the end of every validation step, after every 50 epochs of training, or any frequency that makes sense for your pipeline. Use the side-by-side view to visualize changes in model predictions. @@ -112,7 +116,7 @@ For example, compare predictions between a `baseline` and a new model variant, ` ## Visualize how values change throughout your runs -View how values you log to a table change throughout your runs with a step slider. Slide the step slider to view the values logged at different steps. For example, you can view how the loss, accuracy, or other metrics change after each run. +View how values you log to a table change throughout your runs with a step slider. Use this view when you want to inspect how a logged value evolves across training, rather than comparing two snapshots. Slide the step slider to view the values logged at different steps. For example, you can view how the loss, accuracy, or other metrics change after each run. The slider uses a key to determine the step value. The default key for the slider is `_step`, a special key that W&B automatically logs for you. The `_step` key is an integer that increments by 1 each time you call `wandb.Run.log()` in your code. @@ -124,25 +128,25 @@ To add a step slider to a W&B Table: 4. Within the query expression editor, select `runs` and press **Enter** on your keyboard. 5. Click the gear icon to view the settings for the panel. 6. Set **Render As** selector to **Stepper**. -7. Set **Stepper Key** to `_step` or the [key to use as the unit](#custom-step-keys) for the step slider. +7. Set **Stepper Key** to `_step` or the [key to use as the unit](#custom-step-key) for the step slider. -The following image shows a query panel with three W&B runs and the values they logged at step 295. +The query panel now displays a step slider that you can use to scrub through the values logged at each step. The following image shows a query panel with three W&B runs and the values they logged at step 295. Step slider feature -Within the W&B App UI you may notice duplicate values for multiple steps. This duplication can occur if multiple runs log the same value at different steps, or if a run does not log values at every step. If a value is missing for a given step, W&B uses the last value that was logged as the slider key. +Within the W&B App UI you may notice duplicate values for multiple steps. This duplication can occur if multiple runs log the same value at different steps, or if a run doesn't log values at every step. If a value is missing for a given step, W&B uses the last logged value as the slider key. ### Custom step key The step key can be any numeric metric that you log in your runs as the step key, such as `epoch` or `global_step`. When you use a custom step key, W&B maps each value of that key to a step (`_step`) in the run. -This table shows how a custom step key `epoch` maps to `_step` values for three different runs: `serene-sponge`, `lively-frog`, and `vague-cloud`. Each row represents a call to `wandb.Run.log()` at a particular `_step` in a run. The columns show the corresponding epoch values, if any, that were logged at those steps. Some `_step` values are omitted to save space. +This table shows how a custom step key `epoch` maps to `_step` values for three different runs: `serene-sponge`, `lively-frog`, and `vague-cloud`. Each row represents a call to `wandb.Run.log()` at a particular `_step` in a run. The columns show the corresponding epoch values, if any, logged at those steps. Some `_step` values are omitted to save space. -The first time `wandb.Run.log()` was called, none of the runs logged an `epoch` value, so the table shows empty values for `epoch`. +The first time you call `wandb.Run.log()`, none of the runs logged an `epoch` value, so the table shows empty values for `epoch`. -| `_step` | vague-cloud (`epoch`) | lively-frog(`epoch`) | serene-sponge (`epoch`) | +| `_step` | `vague-cloud` (`epoch`) | `lively-frog` (`epoch`) | `serene-sponge` (`epoch`) | | ------- | ------------- | ----------- | ----------- | | 1 | | | | | 2 | | | 1 | @@ -160,15 +164,15 @@ The first time `wandb.Run.log()` was called, none of the runs logged an `epoch` Now, if the slider is set to `epoch = 1`, the following happens: -* `vague-cloud` finds `epoch = 1` and returns the value logged at `_step = 5` -* `lively-frog` finds `epoch = 1` and returns the value logged at `_step = 4` -* `serene-sponge` finds `epoch = 1` and returns the value logged at `_step = 2` +* `vague-cloud` finds `epoch = 1` and returns the value logged at `_step = 5`. +* `lively-frog` finds `epoch = 1` and returns the value logged at `_step = 4`. +* `serene-sponge` finds `epoch = 1` and returns the value logged at `_step = 2`. If the slider is set to `epoch = 9`: -* `vague-cloud` also doesn't log `epoch = 9`, so W&B uses the latest prior value `epoch = 3` and returns the value logged at `_step = 20` -* `lively-frog` doesn’t log `epoch = 9`, but the latest prior value is `epoch = 5` so it returns the value logged at `_step = 20` -* `serene-sponge` finds `epoch = 9` and return the value logged at `_step = 18` +* `vague-cloud` also doesn't log `epoch = 9`, so W&B uses the latest prior value `epoch = 3` and returns the value logged at `_step = 20`. +* `lively-frog` doesn't log `epoch = 9`, but the latest prior value is `epoch = 5` so it returns the value logged at `_step = 20`. +* `serene-sponge` finds `epoch = 9` and returns the value logged at `_step = 18`. {/* | | | | ---- | ---- | @@ -179,13 +183,14 @@ If the slider is set to `epoch = 9`: ## Save your view -Tables you interact with in the run workspace, project workspace, or a report automatically saves their view state. If you apply any table operations then close your browser, the table retains the last viewed configuration when you next navigate to the table. +This section describes how table state is preserved across sessions and how to export a table snapshot to a report. Tables you interact with in the run workspace, project workspace, or a report automatically save their view state. If you apply any table operations then close your browser, the table retains the last viewed configuration when you next navigate to the table. -Tables you interact with in the artifact context remains stateless. +Tables you interact with in the artifact context remain stateless. -To save a table from a workspace in a particular state, export it to a W&B Report. To export a table to report: +To save a table from a workspace in a particular state, export it to a W&B Report. To export a table to a report: + 1. Select the **action ()** menu in the top right corner of your workspace visualization panel. 2. Select either **Share panel** or **Add to report**. @@ -196,7 +201,7 @@ To save a table from a workspace in a particular state, export it to a W&B Repor ## Examples -These reports highlight the different use cases of W&B Tables: +The following reports highlight the different use cases of W&B Tables and show the techniques on this page applied to real datasets: * [Visualize Predictions Over Time](https://wandb.ai/stacey/mnist-viz/reports/Visualize-Predictions-over-Time--Vmlldzo1OTQxMTk) * [How to Compare Tables in Workspaces](https://wandb.ai/stacey/xtable/reports/How-to-Compare-Tables-in-Workspaces--Vmlldzo4MTc0MTA)