Experiments: multi node logging#1246
Conversation
Deploying docs with
|
| Latest commit: |
649dcea
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://f63c3744.docodile.pages.dev |
| Branch Preview URL: | https://multi-node-runs.docodile.pages.dev |
|
Images automagically compressed by Calibre's image-actions ✨ Compression reduced images by 44.6%, saving 235.95 KB.
437 images did not require optimisation. |
noaleetz
left a comment
There was a problem hiding this comment.
looks good! commented on including the console log experience in this shared mode (it is specific to shared mode / distributed setup).
|
Images automagically compressed by Calibre's image-actions ✨ Compression reduced images by 46.2%, saving 263.42 KB.
438 images did not require optimisation. |
noaleetz
left a comment
There was a problem hiding this comment.
server release constraint
mdlinville
left a comment
There was a problem hiding this comment.
The rest of my feedback -- please ignore anything where I've introduced errors or it doesn't make sense to you. I'm not so familiar with this area of content.
noaleetz
left a comment
There was a problem hiding this comment.
missing server release constraint
mdlinville
left a comment
There was a problem hiding this comment.
Some small things, which I leave up to you whether to change or not. This is a big improvement to this page.
| 1. Checks the rank with the `--local_rank` command line argument. | ||
| 1. If the rank is set to 0, sets up `wandb` logging conditionally in the [`train()`](https://github.com/wandb/examples/blob/master/examples/pytorch/pytorch-ddp/log-ddp.py#L24) function. | ||
|
|
||
| ```python |
There was a problem hiding this comment.
I still think this might be better to use the Prism shortcode since you explicitly name the script here. You can grep around for some examples.
Cleans up existing "Distributed logging" doc and adds a section on what and how to use public "Multi node" feature.
Jira ticket: https://wandb.atlassian.net/browse/DOCS-1373