Skip to content

Commit

Permalink
Create UBE.md
Browse files Browse the repository at this point in the history
  • Loading branch information
tigerneil committed Sep 19, 2017
1 parent 98f9420 commit 7174891
Showing 1 changed file with 19 additions and 0 deletions.
19 changes: 19 additions & 0 deletions UBE.md
@@ -0,0 +1,19 @@
The Uncertainty Bellman Equation and Exploration

Brendan O’Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih
Deepmind
{bodonoghue, iosband, munos, vmnih}@google.com
September 19, 2017

Abstract

We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is
well known that the Bellman equation connects the value at any time-step to the expected value at
subsequent time-steps. In this paper we consider a similar uncertainty Bellman equation (UBE), which
connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby
extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the
unique fixed point of the UBE yields an upper bound on the variance of the estimated value of any fixed
policy. This bound can be much tighter than traditional count-based bonuses that compound standard
deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this
method scales naturally to large systems with complex generalization. Substituting our UBE-exploration
strategy for -greedy improves DQN performance on 51 out of 57 games in the Atari suite.

0 comments on commit 7174891

Please sign in to comment.