[REQUEST] Run time benchmarks, #225

ajam74001 · 2022-09-19T16:19:12Z

Hello dear @takuseno, Thank you very much for sharing this amazing library. I am training CQL and DQN models for breakout Atari on V100 GPU. However, the training is so slow (it takes a day to run 50 episodes). I was wondering if you have a benchmark for run times?

takuseno · 2022-09-21T10:39:27Z

Hello @ajam74001 , sorry for the late reply. It's hard to tell without a reproducible minimal example code. In my environment, this reproduction script is finished with 1080ti within a day.
https://github.com/takuseno/d3rlpy/blob/master/reproductions/offline/discrete_cql.py

I'm suspecting that you didn't turn on GPU.

ajam74001 · 2022-09-21T13:39:10Z

Hello dear @takuseno, Many thanks for your reply. here is the piece of code that I am running:

dataset, env = d3rlpy.datasets.get_atari('breakout-expert-v0')

cql = d3rlpy.algos.DiscreteCQL( use_gpu=True)

cql.fit(dataset, n_steps=2000000, eval_episodes=dataset,scorers={
        'environment': evaluate_on_environment(env), 
            'advantage': discounted_sum_of_advantage_scorer, 
            'td_error': td_error_scorer, 
            'value_scale': average_value_estimation_scorer
    }, tensorboard_dir='runs-atari-1')

takuseno · 2022-09-21T13:57:26Z

Can you do experiment based on the reproduction script above? It's designed to reproduce the paper results. If you still have the same issue, we need to investigate more about this. You can change the dataset.

ajam74001 · 2022-10-03T15:40:58Z

Hello @takuseno , so sorry for the late reply, due to the Iran situation I didn’t have a proper internet connection. I have tried your code and I observed that I am running the code on the GPU and it is still very very slow. Here is a screenshot of the GPU usage.

takuseno · 2022-10-04T01:39:28Z

Can you share how slow it is and the number of iterations per second? According to your photo, GPU usage is high enough to say it's correcly running.

ajam74001 · 2022-10-05T12:35:08Z

Sure, each epoch takes half an hour. So, 100 epochs take 50 hours. here is a screenshot.

takuseno · 2022-10-05T13:01:18Z

Thanks for sharing. I just remember that the runtime metrics are all recorded in d3rlpy-benchmarks repository. Here is the sample:
https://github.com/takuseno/d3rlpy-benchmarks/blob/main/atari/DiscreteBCQ_asterix_10_20220504124857/time_step.csv

0.013 sec x 125000 iter / 3600 = 0.45 h/epoch

So it's roughly the same as what I had. It's actually sane because the number of iterations is equivalent to the standard 200M steps online Atari training. Even DeepMind takes 3-4 days to finish it

ajam74001 · 2022-10-05T13:07:09Z

My pleasure!
Thank you for the link, I will check it out.

takuseno · 2022-11-19T06:06:16Z

It seems resolved now? Feel free to reopen this for the further discussion.

ajam74001 added the enhancement New feature or request label Sep 19, 2022

takuseno closed this as completed Nov 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQUEST] Run time benchmarks, #225

[REQUEST] Run time benchmarks, #225

ajam74001 commented Sep 19, 2022

takuseno commented Sep 21, 2022

ajam74001 commented Sep 21, 2022

takuseno commented Sep 21, 2022 •

edited

ajam74001 commented Oct 3, 2022

takuseno commented Oct 4, 2022

ajam74001 commented Oct 5, 2022

takuseno commented Oct 5, 2022 •

edited

ajam74001 commented Oct 5, 2022

takuseno commented Nov 19, 2022

[REQUEST] Run time benchmarks, #225

[REQUEST] Run time benchmarks, #225

Comments

ajam74001 commented Sep 19, 2022

takuseno commented Sep 21, 2022

ajam74001 commented Sep 21, 2022

takuseno commented Sep 21, 2022 • edited

ajam74001 commented Oct 3, 2022

takuseno commented Oct 4, 2022

ajam74001 commented Oct 5, 2022

takuseno commented Oct 5, 2022 • edited

ajam74001 commented Oct 5, 2022

takuseno commented Nov 19, 2022

takuseno commented Sep 21, 2022 •

edited

takuseno commented Oct 5, 2022 •

edited