-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Bug Report
Description
Whilst checking out the new dvc queue command I have run into some unexpected behaviour. I won't duplicate the steps to reproduce here but after queueing and running experiments I have run in to two different issues.
VS Code demo project: dvc queue status returning ERROR: Invalid experiment '{entry.stash_rev[:7]}'. (produced when running with the extension)
example-get-started: dvc queue status returning
Task Name Created Status
f3d69ee 02:17 PM Success
08ccb05 02:17 PM Success
ERROR: unexpected error - Extra data: line 1 column 56 (char 55)
(produced without having the extension involved).
In both instances this resulted in the HEAD baseline entry being dropped from the exp show data:
example-get-started example
❯ dvc exp show --show-json
{
"workspace": {
"baseline": {
"data": {
"timestamp": null,
"params": {
"params.yaml": {
"data": {
"prepare": {
"split": 0.21,
"seed": 20170428
},
"featurize": {
"max_features": 200,
"ngrams": 2
},
"train": {
"seed": 20170428,
"n_est": 50,
"min_split": 0.01
}
}
}
},
"deps": {
"data/data.xml": {
"hash": "22a1a2931c8370d3aeedd7183606fd7f",
"size": 14445097,
"nfiles": null
},
"src/prepare.py": {
"hash": "f09ea0c15980b43010257ccb9f0055e2",
"size": 1576,
"nfiles": null
},
"data/prepared": {
"hash": "153aad06d376b6595932470e459ef42a.dir",
"size": 8437363,
"nfiles": 2
},
"src/featurization.py": {
"hash": "e0265fc22f056a4b86d85c3056bc2894",
"size": 2490,
"nfiles": null
},
"data/features": {
"hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
"size": 2232588,
"nfiles": 2
},
"src/train.py": {
"hash": "c3961d777cfbd7727f9fde4851896006",
"size": 967,
"nfiles": null
},
"model.pkl": {
"hash": "46865edbf3d62fc5c039dd9d2b0567a4",
"size": 1763725,
"nfiles": null
},
"src/evaluate.py": {
"hash": "44e714021a65edf881b1716e791d7f59",
"size": 2346,
"nfiles": null
}
},
"outs": {
"data/prepared": {
"hash": "153aad06d376b6595932470e459ef42a.dir",
"size": 8437363,
"nfiles": 2,
"use_cache": true,
"is_data_source": false
},
"data/features": {
"hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
"size": 2232588,
"nfiles": 2,
"use_cache": true,
"is_data_source": false
},
"model.pkl": {
"hash": "46865edbf3d62fc5c039dd9d2b0567a4",
"size": 1763725,
"nfiles": null,
"use_cache": true,
"is_data_source": false
},
"data/data.xml": {
"hash": "22a1a2931c8370d3aeedd7183606fd7f",
"size": 14445097,
"nfiles": null,
"use_cache": true,
"is_data_source": true
}
},
"queued": false,
"running": false,
"executor": null,
"metrics": {
"evaluation.json": {
"data": {
"avg_prec": 0.9249974999612706,
"roc_auc": 0.9460213440787918
}
}
}
}
}
},
"f3d69eedda6b1c051b115523cf5c6c210490d0ea": {
"baseline": {
"data": {
"timestamp": "2022-07-13T14:17:20",
"params": {
"params.yaml": {
"data": {
"prepare": {
"split": 0.21,
"seed": 20170428
},
"featurize": {
"max_features": 200,
"ngrams": 2
},
"train": {
"seed": 20170428,
"n_est": 50,
"min_split": 0.01
}
}
}
},
"deps": {
"data/data.xml": {
"hash": "22a1a2931c8370d3aeedd7183606fd7f",
"size": 14445097,
"nfiles": null
},
"src/prepare.py": {
"hash": "f09ea0c15980b43010257ccb9f0055e2",
"size": 1576,
"nfiles": null
},
"data/prepared": {
"hash": "153aad06d376b6595932470e459ef42a.dir",
"size": 8437363,
"nfiles": 2
},
"src/featurization.py": {
"hash": "e0265fc22f056a4b86d85c3056bc2894",
"size": 2490,
"nfiles": null
},
"data/features": {
"hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
"size": 2232588,
"nfiles": 2
},
"src/train.py": {
"hash": "c3961d777cfbd7727f9fde4851896006",
"size": 967,
"nfiles": null
},
"model.pkl": {
"hash": "46865edbf3d62fc5c039dd9d2b0567a4",
"size": 1763725,
"nfiles": null
},
"src/evaluate.py": {
"hash": "44e714021a65edf881b1716e791d7f59",
"size": 2346,
"nfiles": null
}
},
"outs": {
"data/prepared": {
"hash": "153aad06d376b6595932470e459ef42a.dir",
"size": 8437363,
"nfiles": 2,
"use_cache": true,
"is_data_source": false
},
"data/features": {
"hash": "f35d4cc2c552ac959ae602162b8543f3.dir",
"size": 2232588,
"nfiles": 2,
"use_cache": true,
"is_data_source": false
},
"model.pkl": {
"hash": "46865edbf3d62fc5c039dd9d2b0567a4",
"size": 1763725,
"nfiles": null,
"use_cache": true,
"is_data_source": false
},
"data/data.xml": {
"hash": "22a1a2931c8370d3aeedd7183606fd7f",
"size": 14445097,
"nfiles": null,
"use_cache": true,
"is_data_source": true
}
},
"queued": false,
"running": false,
"executor": null,
"metrics": {
"evaluation.json": {
"data": {
"avg_prec": 0.9249974999612706,
"roc_auc": 0.9460213440787918
}
}
}
}
}
}
}
Reproduce
- clone
example-get-started - add
git+https://github.com/iterative/dvctosrc/requirements.txt - create venv, source activate script and install requirements
dvc pull- change params.yaml and queue x2 with
dvc exp run --queue dvc queue start -j 2dvc exp showdvc queue statusdvc exp show
When recreating this I can see that both experiments were successful in dvc queue status but the second one has not made it into the table. Final results:
❯ dvc queue status
Task Name Created Status
9d22751 02:50 PM Success
962c834 02:50 PM Success
Worker status: 0 active, 0 idle
First column of exp show:
workspace
bigrams-experiment
└── 65584bd [exp-c88e8]
and the shas don't match?
Expected
Should be able to run exp show & queue status in parallel with the execution of tasks from the queue.
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 2.13.1.dev87+gc2668110
---------------------------------
Platform: Python 3.8.9 on macOS-12.2.1-arm64-arm-64bit
Supports:
webhdfs (fsspec = 2022.5.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, gitAdditional Information (if any):
Please let me know if you need anything else from me. Thank you.