Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQUEST] Run time benchmarks, #225

Closed
ajam74001 opened this issue Sep 19, 2022 · 9 comments
Closed

[REQUEST] Run time benchmarks, #225

ajam74001 opened this issue Sep 19, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@ajam74001
Copy link

Hello dear @takuseno, Thank you very much for sharing this amazing library. I am training CQL and DQN models for breakout Atari on V100 GPU. However, the training is so slow (it takes a day to run 50 episodes). I was wondering if you have a benchmark for run times?

@ajam74001 ajam74001 added the enhancement New feature or request label Sep 19, 2022
@takuseno
Copy link
Owner

Hello @ajam74001 , sorry for the late reply. It's hard to tell without a reproducible minimal example code. In my environment, this reproduction script is finished with 1080ti within a day.
https://github.com/takuseno/d3rlpy/blob/master/reproductions/offline/discrete_cql.py

I'm suspecting that you didn't turn on GPU.

@ajam74001
Copy link
Author

Hello dear @takuseno, Many thanks for your reply. here is the piece of code that I am running:

dataset, env = d3rlpy.datasets.get_atari('breakout-expert-v0')

cql = d3rlpy.algos.DiscreteCQL( use_gpu=True)

cql.fit(dataset, n_steps=2000000, eval_episodes=dataset,scorers={
        'environment': evaluate_on_environment(env), 
            'advantage': discounted_sum_of_advantage_scorer, 
            'td_error': td_error_scorer, 
            'value_scale': average_value_estimation_scorer
    }, tensorboard_dir='runs-atari-1')

@takuseno
Copy link
Owner

takuseno commented Sep 21, 2022

Can you do experiment based on the reproduction script above? It's designed to reproduce the paper results. If you still have the same issue, we need to investigate more about this. You can change the dataset.

@ajam74001
Copy link
Author

Hello @takuseno , so sorry for the late reply, due to the Iran situation I didn’t have a proper internet connection. I have tried your code and I observed that I am running the code on the GPU and it is still very very slow. Here is a screenshot of the GPU usage.
Screen Shot 2022-10-03 at 17 33 45

@takuseno
Copy link
Owner

takuseno commented Oct 4, 2022

Can you share how slow it is and the number of iterations per second? According to your photo, GPU usage is high enough to say it's correcly running.

@ajam74001
Copy link
Author

Sure, each epoch takes half an hour. So, 100 epochs take 50 hours. here is a screenshot.
Screen Shot 2022-10-05 at 15 32 18

@takuseno
Copy link
Owner

takuseno commented Oct 5, 2022

Thanks for sharing. I just remember that the runtime metrics are all recorded in d3rlpy-benchmarks repository. Here is the sample:
https://github.com/takuseno/d3rlpy-benchmarks/blob/main/atari/DiscreteBCQ_asterix_10_20220504124857/time_step.csv

0.013 sec x 125000 iter / 3600 = 0.45 h/epoch

So it's roughly the same as what I had. It's actually sane because the number of iterations is equivalent to the standard 200M steps online Atari training. Even DeepMind takes 3-4 days to finish it

@ajam74001
Copy link
Author

My pleasure!
Thank you for the link, I will check it out.

@takuseno
Copy link
Owner

It seems resolved now? Feel free to reopen this for the further discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants