Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to test an agent ? #191

Open
MoMe36 opened this issue May 20, 2019 · 4 comments
Open

Best way to test an agent ? #191

MoMe36 opened this issue May 20, 2019 · 4 comments

Comments

@MoMe36
Copy link

MoMe36 commented May 20, 2019

Hi ! Very cool lib, it's super pleasing to see so many rl-algorithms within reach ! I've recently trained an agent using SAC and I was wondering how to load it for visual inspection on its environment.
Also, are checkpoints implemented ? Is it possible to resume training ?

Thanks !

@zuoxingdong
Copy link
Owner

Hi @MoMe36 , thank you for your positive feedback ! Regarding your questions:

  • Can you explain what do you mean visual inspection here ? For current implementation, it checkpoints the model a few times with .pth file, and the file name is the number of iterations. If you mean generating video animation by execution a checkpoint file for one episode, this is done in baselines/plot.ipynb file, in the second cell, there is make_video function.

To make it more clear how the files are structured when one uses run_experiment:
Suppose one defines experiment name as 'default' in the run_experiment function, and there is no configuration sweeping, i.e. single configuration (it means no Grid/Random object in the config), then a single unique job ID is number 0, and it has 3 random runs with different random seeds, say [123, 456, 789].

Then the file structure under logs looks like:

- logs
    - default  # experiment name
        - 0  # job ID
            - 123  # random seed
            - 456
            - 789

Under each seed leaf folder, all loggings, checkpoints are stored. To generate a video animation by executing one checkpoint file, one could load simply call make_video function in the plot.ipynb file with corresponding arguments, all the file loadings, action selection, generate mp4 files are done for you internally.

  • For current status, the checkpointing mechanism is implemented, in the file experiment.py for each algorithm, you can see checkpoint.num in the config object to define how many checkpoint files to generate during entire training.
    • However, the resume training functionality is not supported yet for now. Although this should not be too painful to add, but because of the philosophy of this repo is to provide research-friendly code with minimialism and easy-to-modify. I am still hesitating what is the best way to implement it with minimal change plus good user-experience and low coding complexity. But for sure, this would be very nice functionality to have.

Hope it helps, and don't hesitate to discuss together if you have further questions.

@MoMe36
Copy link
Author

MoMe36 commented May 21, 2019

Hi ! Thanks for the quick and detailed answer! I did figure out how to use the make_video method which is exactly what I was looking for. However, I'm not able to find the checkpoints and logging.
I do have the folder structure you're describing, but when I check within a seed folder, I find only agent_1.pth (along with obs_moment.pth in the case of PPO). And the agent really doesn't perform as it should, given its episode returns.

I checked the config file within which I specified the log freq to be at 10, but I don't see any other checkpoint beside the one I mentionned, even after 200 iterations.

Do you have any idea what I'm doing wrong ? Thanks a lot (:

@MoMe36
Copy link
Author

MoMe36 commented May 21, 2019

Alright, I get it ! Checkpoints are saved only at the end of training, it seems. Thanks anyway !

@zuoxingdong
Copy link
Owner

zuoxingdong commented May 21, 2019

Hi @MoMe36 , for PPO, it's better to use logs/default folder, because other logging folders are temporary (i.e. they might not have all checkpoints).

  • log_freq: it only controls how frequently to dump loggings to the screen only.
  • The checkpointing is controlled by checkpoint.num, e.g. for checkpoint.num=3, it firstly checkpoints before the training, and second in the middle of the entire training, and third one at the end of the training. If you want more checkpoints, you can simply increase the integer in the config object.

Hope it helps !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants