You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the AWS Integrations, we usually run experiments using AWS spot instances to save cost. However, sometimes there's a need to running experiments for a long time. Real use cases include running montezuma's revenge by @yooceii and certain microrts tasks by myself. So we should look more into this issue.
By consulting this resource, I am considering storing the models periodically on the associated wandb run of certain run_id, and should the aws instance terminate, we basically pull the associated models from the run with run_id and continue training.
The text was updated successfully, but these errors were encountered:
# to start
python ppo_autoregressive.py \
--wandb-project-name gym-microrts \
--total-timesteps 100000000 \
--gym-id MicrortsDefeatWorkerRushEnemyShaped-v2 \
--prod-mode True \
--capture-video True
# in case it terminates, use the following to resume
export WANDB_RESUME=must
export WANDB_RUN_ID=2kse3aqy # get your related run id
python ppo_autoregressive.py \
--wandb-project-name gym-microrts \
--total-timesteps 100000000 \
--gym-id MicrortsDefeatWorkerRushEnemyShaped-v2 \
--prod-mode True \
--capture-video True
Problem Description
For the AWS Integrations, we usually run experiments using AWS spot instances to save cost. However, sometimes there's a need to running experiments for a long time. Real use cases include running montezuma's revenge by @yooceii and certain microrts tasks by myself. So we should look more into this issue.
By consulting this resource, I am considering storing the models periodically on the associated wandb run of certain
run_id
, and should the aws instance terminate, we basically pull the associated models from the run withrun_id
and continue training.The text was updated successfully, but these errors were encountered: