Reflections on results added (#278)

yandexdataschool · Sep 4, 2019 · 017febc · 017febc
1 parent 8d46fba
commit 017febc
Showing 1 changed file with 10 additions and 25 deletions.
diff --git a/week01_intro/crossentropy_method.ipynb b/week01_intro/crossentropy_method.ipynb
@@ -386,34 +386,19 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```\n",
+    "### Reflecting on results\n",
     "\n",
-    "```\n",
+    "You may have noticed that the taxi problem quickly converges from <-1000 to a near-optimal score and then descends back into -50/-100. This is in part because the environment has some innate randomness. Namely, the starting points of passenger/driver change from episode to episode.\n",
     "\n",
-    "```\n",
-    "\n",
-    "```\n",
-    "\n",
-    "```\n",
-    "\n",
-    "```\n",
-    "\n",
-    "```\n",
-    "\n",
-    "```\n",
-    "\n",
-    "```\n",
-    "\n",
-    "```\n",
-    "\n",
-    "```\n",
-    "\n",
-    "```\n",
-    "\n",
-    "```\n",
-    "\n",
-    "```\n",
+    "In case CEM failed to learn how to win from one distinct starting point, it will simply discard it because no sessions from that starting point will make it into the \"elites\".\n",
     "\n",
+    "To mitigate that problem, you can either reduce the threshold for elite sessions (duct tape way) or  change the way you evaluate strategy (theoretically correct way). You can first sample an action for every possible state and then evaluate this choice of actions by running _several_ games and averaging rewards."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "\n",
     "### You're not done yet!\n",
     "\n",