Skip to content

Commit

Permalink
Reflections on results added (#278)
Browse files Browse the repository at this point in the history
  • Loading branch information
yhn112 authored and dniku committed Sep 4, 2019
1 parent 8d46fba commit 017febc
Showing 1 changed file with 10 additions and 25 deletions.
35 changes: 10 additions & 25 deletions week01_intro/crossentropy_method.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -386,34 +386,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"### Reflecting on results\n",
"\n",
"```\n",
"You may have noticed that the taxi problem quickly converges from <-1000 to a near-optimal score and then descends back into -50/-100. This is in part because the environment has some innate randomness. Namely, the starting points of passenger/driver change from episode to episode.\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"In case CEM failed to learn how to win from one distinct starting point, it will simply discard it because no sessions from that starting point will make it into the \"elites\".\n",
"\n",
"To mitigate that problem, you can either reduce the threshold for elite sessions (duct tape way) or change the way you evaluate strategy (theoretically correct way). You can first sample an action for every possible state and then evaluate this choice of actions by running _several_ games and averaging rewards."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### You're not done yet!\n",
"\n",
Expand Down

0 comments on commit 017febc

Please sign in to comment.