Skip to content

dvc queue: unexpected behaviour #8014

@mattseddon

Description

@mattseddon

Bug Report

Description

Whilst checking out the new dvc queue command I have run into some unexpected behaviour. I won't duplicate the steps to reproduce here but after queueing and running experiments I have run in to two different issues.

VS Code demo project: dvc queue status returning ERROR: Invalid experiment '{entry.stash_rev[:7]}'. (produced when running with the extension)
example-get-started: dvc queue status returning

Task     Name    Created    Status
f3d69ee          02:17 PM   Success
08ccb05          02:17 PM   Success

ERROR: unexpected error - Extra data: line 1 column 56 (char 55)

(produced without having the extension involved).

In both instances this resulted in the HEAD baseline entry being dropped from the exp show data:

example-get-started example
❯ dvc exp show --show-json
{
  "workspace": {
    "baseline": {
      "data": {
        "timestamp": null,
        "params": {
          "params.yaml": {
            "data": {
              "prepare": {
                "split": 0.21,
                "seed": 20170428
              },
              "featurize": {
                "max_features": 200,
                "ngrams": 2
              },
              "train": {
                "seed": 20170428,
                "n_est": 50,
                "min_split": 0.01
              }
            }
          }
        },
        "deps": {
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null
          },
          "src/prepare.py": {
            "hash": "f09ea0c15980b43010257ccb9f0055e2",
            "size": 1576,
            "nfiles": null
          },
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2
          },
          "src/featurization.py": {
            "hash": "e0265fc22f056a4b86d85c3056bc2894",
            "size": 2490,
            "nfiles": null
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2
          },
          "src/train.py": {
            "hash": "c3961d777cfbd7727f9fde4851896006",
            "size": 967,
            "nfiles": null
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null
          },
          "src/evaluate.py": {
            "hash": "44e714021a65edf881b1716e791d7f59",
            "size": 2346,
            "nfiles": null
          }
        },
        "outs": {
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": false
          },
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": true
          }
        },
        "queued": false,
        "running": false,
        "executor": null,
        "metrics": {
          "evaluation.json": {
            "data": {
              "avg_prec": 0.9249974999612706,
              "roc_auc": 0.9460213440787918
            }
          }
        }
      }
    }
  },
  "f3d69eedda6b1c051b115523cf5c6c210490d0ea": {
    "baseline": {
      "data": {
        "timestamp": "2022-07-13T14:17:20",
        "params": {
          "params.yaml": {
            "data": {
              "prepare": {
                "split": 0.21,
                "seed": 20170428
              },
              "featurize": {
                "max_features": 200,
                "ngrams": 2
              },
              "train": {
                "seed": 20170428,
                "n_est": 50,
                "min_split": 0.01
              }
            }
          }
        },
        "deps": {
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null
          },
          "src/prepare.py": {
            "hash": "f09ea0c15980b43010257ccb9f0055e2",
            "size": 1576,
            "nfiles": null
          },
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2
          },
          "src/featurization.py": {
            "hash": "e0265fc22f056a4b86d85c3056bc2894",
            "size": 2490,
            "nfiles": null
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2
          },
          "src/train.py": {
            "hash": "c3961d777cfbd7727f9fde4851896006",
            "size": 967,
            "nfiles": null
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null
          },
          "src/evaluate.py": {
            "hash": "44e714021a65edf881b1716e791d7f59",
            "size": 2346,
            "nfiles": null
          }
        },
        "outs": {
          "data/prepared": {
            "hash": "153aad06d376b6595932470e459ef42a.dir",
            "size": 8437363,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "data/features": {
            "hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
            "size": 2232588,
            "nfiles": 2,
            "use_cache": true,
            "is_data_source": false
          },
          "model.pkl": {
            "hash": "46865edbf3d62fc5c039dd9d2b0567a4",
            "size": 1763725,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": false
          },
          "data/data.xml": {
            "hash": "22a1a2931c8370d3aeedd7183606fd7f",
            "size": 14445097,
            "nfiles": null,
            "use_cache": true,
            "is_data_source": true
          }
        },
        "queued": false,
        "running": false,
        "executor": null,
        "metrics": {
          "evaluation.json": {
            "data": {
              "avg_prec": 0.9249974999612706,
              "roc_auc": 0.9460213440787918
            }
          }
        }
      }
    }
  }
}

Reproduce

  1. clone example-get-started
  2. add git+https://github.com/iterative/dvc to src/requirements.txt
  3. create venv, source activate script and install requirements
  4. dvc pull
  5. change params.yaml and queue x2 with dvc exp run --queue
  6. dvc queue start -j 2
  7. dvc exp show
  8. dvc queue status
  9. dvc exp show

When recreating this I can see that both experiments were successful in dvc queue status but the second one has not made it into the table. Final results:

❯ dvc queue status 
Task     Name    Created    Status
9d22751          02:50 PM   Success
962c834          02:50 PM   Success

Worker status: 0 active, 0 idle

First column of exp show:

  workspace
  bigrams-experiment
  └── 65584bd [exp-c88e8]

and the shas don't match?

Expected

Should be able to run exp show & queue status in parallel with the execution of tasks from the queue.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.13.1.dev87+gc2668110 
---------------------------------
Platform: Python 3.8.9 on macOS-12.2.1-arm64-arm-64bit
Supports:
        webhdfs (fsspec = 2022.5.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git

Additional Information (if any):

Please let me know if you need anything else from me. Thank you.

Metadata

Metadata

Assignees

Labels

A: experimentsRelated to dvc expp2-mediumMedium priority, should be done, but less importantproduct: VSCodeIntegration with VSCode extension

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions