We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug In the getting started tutorial, I run into a ValueError: operands could not be broadcast together with shapes (1156,) (34,)
ValueError: operands could not be broadcast together with shapes (1156,) (34,)
To Reproduce
!pip install d3rlpy from d3rlpy.datasets import get_cartpole from d3rlpy.algos import DQNConfig from d3rlpy.metrics import TDErrorEvaluator from d3rlpy.metrics import EnvironmentEvaluator dataset, env = get_cartpole() dqn = DQNConfig().create(device="cuda:0") dqn.build_with_dataset(dataset) td_error_evaluator = TDErrorEvaluator(episodes=dataset.episodes) env_evaluator = EnvironmentEvaluator(env) rewards = env_evaluator(dqn, dataset=None) dqn.fit( dataset, n_steps=10000, evaluators={ 'td_error': td_error_evaluator, 'environment': env_evaluator, }, )
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) [<ipython-input-5-59911ad4f4e1>](https://localhost:8080/#) in <cell line: 13>() 11 rewards = env_evaluator(dqn, dataset=None) 12 ---> 13 dqn.fit( 14 dataset, 15 n_steps=10000, 2 frames [/usr/local/lib/python3.10/dist-packages/d3rlpy/algos/qlearning/base.py](https://localhost:8080/#) in fit(self, dataset, n_steps, n_steps_per_epoch, experiment_name, with_timestamp, logger_adapter, show_progress, save_interval, evaluators, callback, epoch_callback) 402 List of result tuples (epoch, metrics) per epoch. 403 """ --> 404 results = list( 405 self.fitter( 406 dataset, [/usr/local/lib/python3.10/dist-packages/d3rlpy/algos/qlearning/base.py](https://localhost:8080/#) in fitter(self, dataset, n_steps, n_steps_per_epoch, experiment_name, with_timestamp, logger_adapter, show_progress, save_interval, evaluators, callback, epoch_callback) 546 if evaluators: 547 for name, evaluator in evaluators.items(): --> 548 test_score = evaluator(self, dataset) 549 logger.add_metric(name, test_score) 550 [/usr/local/lib/python3.10/dist-packages/d3rlpy/metrics/evaluators.py](https://localhost:8080/#) in __call__(self, algo, dataset) 117 rewards = algo.reward_scaler.transform_numpy(rewards) 118 y = rewards + algo.gamma * cast(np.ndarray, next_values) * mask --> 119 total_errors += ((values - y) ** 2).tolist() 120 121 return float(np.mean(total_errors)) ValueError: operands could not be broadcast together with shapes (1156,) (34,)
Expected behavior No value error should occur, and evaluation should continue successfully
Additional context This can be reproduced in a Google Colab notebook.
The text was updated successfully, but these errors were encountered:
@jdesman1 Thanks for testing this! I found a critical bug in computing Q-values of DQN-based algorithms. I'll release an emergency patch to fix this.
Sorry, something went wrong.
The issue has been fixed at this commit: 5ba050d . And the v2.0.3 that includes this fix has been released as well. Thank you for your help!
2.0.3
No branches or pull requests
Describe the bug
In the getting started tutorial, I run into a
ValueError: operands could not be broadcast together with shapes (1156,) (34,)
To Reproduce
Expected behavior
No value error should occur, and evaluation should continue successfully
Additional context
This can be reproduced in a Google Colab notebook.
The text was updated successfully, but these errors were encountered: