Created using Colaboratory

tensorflow · Nov 21, 2019 · 94b2273 · 94b2273
1 parent 9e32727
commit 94b2273
Showing 1 changed file with 319 additions and 0 deletions.
diff --git a/community/en/swift/open_spiel/Introduction_to_GridMaze_Environment.ipynb b/community/en/swift/open_spiel/Introduction_to_GridMaze_Environment.ipynb
@@ -0,0 +1,319 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "Introduction to GridMaze Environment",
+      "provenance": [],
+      "private_outputs": true,
+      "collapsed_sections": []
+    },
+    "kernelspec": {
+      "name": "swift",
+      "display_name": "Swift"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "5qXhi3slajqW",
+        "colab_type": "text"
+      },
+      "source": [
+        "##### Copyright 2019 The TensorFlow Authors\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "2eCrLoRYbJuV",
+        "colab_type": "text"
+      },
+      "source": [
+        "##### Licensed under the Apache License, Version 2.0 (the \"License\");"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "1L-ZwYrwbCeZ",
+        "colab_type": "text"
+      },
+      "source": [
+        "```\n",
+        "// Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+        "// you may not use this file except in compliance with the License.\n",
+        "// You may obtain a copy of the License at\n",
+        "//\n",
+        "// https://www.apache.org/licenses/LICENSE-2.0\n",
+        "//\n",
+        "// Unless required by applicable law or agreed to in writing, software\n",
+        "// distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+        "// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+        "// See the License for the specific language governing permissions and\n",
+        "// limitations under the License.\n",
+        "```"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "l9AP6la-bPuI",
+        "colab_type": "text"
+      },
+      "source": [
+        "# GridMaze Environment"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "U_g59bfqbUPy",
+        "colab_type": "text"
+      },
+      "source": [
+        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/examples/blob/master/community/en/swift/open_spiel/Introduction_to_GridMaze_Environment.ipynb \"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/examples/blob/master/community/en/swift/open_spiel/Introduction_to_GridMaze_Environment.ipynb \"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "KoLzyHV2JWtw",
+        "colab_type": "text"
+      },
+      "source": [
+        "\n",
+        "This environment enables you to create and learn to solve grid mazes.\n",
+        "Solving mazes is a nice problem to start with when learning the basics of reinforcement learning. \n",
+        "\n",
+        "## Introduction\n",
+        "The problem description of the GridMaze environment is that you have a Robot located on a starting cell in a grid maze world. \n",
+        "By moving Up, Down, Left, or Right, the robot should figure out the best strategy to move from the start cell to the goal cell of the maze.\n",
+        "The robot gets a reward (positive value) or a cost (negative value) when moving to a new cell. The purpose of the game is for the robot to figure out how to move from the start to goal position and getting as much reward (or as little cost) as possible.\n",
+        "\n",
+        "Each cell in the grid world have different functions:\n",
+        "* **SPACE**: The robot is free to enter this cell, but there is a at a certain cost for doing so (e.g. -1).\n",
+        "* **START**: This is where the robot starts, like SPACE it can also be entered (returned to), at a cost. A maze must only have one START cell.\n",
+        "* **GOAL**: The robot successfully solves the maze and typically gets a reward by walking to this cell. A maze can have more than one goal cell.\n",
+        "* **WALL**: This cell cannot be entered.\n",
+        "* **BOUNCE**: The robot can attempt to enter this cell, but doing so will make it bounce back to the cell it came from, and get the associated reward/cost of that cell.\n",
+        "* **HOLE**: The robot enters this cell at a typically big cost, and the game is over. In other words, it fails to reach the GOAL cell.\n",
+        "\n",
+        "Let's look at an example:\n",
+        "![ExampleImage](Images/GridMazeExample.png)\n",
+        "\n",
+        "Describing each location in the maze by [row,column]:\n",
+        "* Robot starts out in cell [1,1]. From this cell it can only move right or down, since a wall cell cannot be entered. Going right or down will cost -1.\n",
+        "* The goal cell is at [4,4]. Entering this cell gives the robot 10 in reward, and the robot has successfully solve the maze (the T menas terminal cell).\n",
+        "* Entering the hole in [2,2] will cost the robot -100 and it is a terminal state (game over). Since robot did not reach goal cell, it lost. Likewise for the hole at [3,4].\n",
+        "* Entering one of the space cells at position [2,3] and [3,2] each has a cost of -2.\n",
+        "* The space cell at [4,0] is a bit special. Since the cell at [4,5] is also space, it means the robot can go left in [4,0] and end up in [4,5]. Had [4,5] been a wall, going left from [4,0] would not have been a legal action.\n",
+        "* Cell [4,2] is a bounce cell. This means attempting to enter it from [3,2], [4,1], or [4,3] results in robot bouncing back to those cells. Cost will be -2 from [3,2] and -1 from [4,1] and [4,3].\n",
+        "\n",
+        "Now what about the white arrows with 50%? This indicates that cell [2,4] has random behavior.\n",
+        "If the robot attempts to enter cell [2,4], it will with 50% probability end up in that cell, but it might also well end up in cell [3,4] with 50% probability.\n",
+        "\n",
+        "That's it for an overview of the environment, let's look at how interact with the environment through code!\n",
+        "\n",
+        "## Installing OpenSpiel and imports\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "sSui_vb7LIV1",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "%install '.package(url: \"https://github.com/deepmind/open_spiel\", .branch(\"master\"))' OpenSpiel\n",
+        "import TensorFlow\n",
+        "import OpenSpiel"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ObZCC6zsLElE",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Creating a GridMaze Environment\n",
+        "This is how you create the maze environment in the picture above"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "kZRlD4utdPuX",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "var mazeEnvExample = GridMaze(rowCount: 6, colCount: 6)\n",
+        "mazeEnvExample[1, 1] = START()\n",
+        "mazeEnvExample[2, 2] = HOLE(reward: -100)\n",
+        "mazeEnvExample[2, 3] = SPACE(reward: -2.0)\n",
+        "mazeEnvExample[3, 2] = SPACE(reward: -2.0)\n",
+        "mazeEnvExample[3, 4] = HOLE(reward: -100)\n",
+        "mazeEnvExample[4, 0] = SPACE(reward: -1.0)\n",
+        "mazeEnvExample[4, 2] = BOUNCE()\n",
+        "mazeEnvExample[4, 4] = GOAL(reward: 1.0)\n",
+        "mazeEnvExample[4, 5] = SPACE(reward: -1.0)\n",
+        "mazeEnvExample[2, 4].entryJumpProbabilities = [(.Relative(1, 0), 0.5), (.Welcome, 0.5)]"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "wpGeqhR3L0gI",
+        "colab_type": "text"
+      },
+      "source": [
+        "As you can see, access to the cells in the environment is possible using the ```[row,column]``` syntax (GridMaze has overloaded the subscript operator).\n",
+        "By default, when you create and environment the cells will be surrounded by WALLs, and the free space inside it will be SPACE with cost -1 (see below for more information on changing this default behavior).\n",
+        "\n",
+        "This code is pretty straightforward, except the last statement regarding the ```entryJumpProbabilities``` that needs some more explanation.\n",
+        "For any grid cell, you can specify the jump probabilities as follows\n",
+        "```\n",
+        "mazeEnvExample[row,col].entryJumpProbabilities = [\n",
+        "   // This is a list of jump probabilities\n",
+        "   // Each element is a pair, the first element specifies the jump behavior and the second the probability of it happening\n",
+        "   \n",
+        "   (.Welcome, 0.25),        // With 0.25 probability, welcome the robot to this cell\n",
+        "   (.BounceBack, 0.25),     // With 0.25 probability, make the robot bounce back to cell it was standing on\n",
+        "   (.Relative(1,0), 0.25),  // With 0.25 probability, jump to the next row (1), same column (0) in the matrix\n",
+        "   (.Absolute(1,1), 0.25),  // With 0.25 probability, jumpt to cell at row 1, column 1  \n",
+        "]\n",
+        "```\n",
+        "\n",
+        "You can also omit specifying the ```entryJumpProbabilities```, in case the default for the cell type will apply, i.e.\n",
+        "* SPACE, START, GOAL, and HOLE will be ```[(.Welcome, 1.0)]```\n",
+        "* BOUNCE will be ```[(.BounceBack), 1.0]```\n",
+        "\n",
+        "A couple of rules related to entryJumpProbabilities\n",
+        "* If you do specify this field, it must be either have 0 elements (== 100% ```.Welcome```), or probabilities in list must sum to 1.0.\n",
+        "* When using ```.Relative``` or ```.Absolute```, the target cell must be 100% ```.Welcome```. The environment current does not support sequenced jumps.\n",
+        "\n",
+        "There is also currently limilted environment validation (contributions are welcome).\n",
+        "In other words, make sure you index cells that exist, and if you're doing ```entryJumpProbabilities```, that any ```.Relative``` or ```.Absolute``` target cell exist (and does not in turn have ```entryJumpProbabilities```).\n",
+        "\n",
+        "The ```init``` function of GridMaze looks like this\n",
+        "\n",
+        "```\n",
+        "public init(\n",
+        "  rowCount: Int,\n",
+        "  colCount: Int,\n",
+        "  cellsLeftSide: GridCell = WALL(),\n",
+        "  cellsTopSide: GridCell = WALL(),\n",
+        "  cellsBottomSide: GridCell = WALL(),\n",
+        "  cellsRightSide: GridCell = WALL(),\n",
+        "  cellsTopLeft: GridCell = WALL(),\n",
+        "  cellsTopRight: GridCell = WALL(),\n",
+        "  cellsBottomLeft: GridCell = WALL(),\n",
+        "  cellsBottomRight: GridCell = WALL(),\n",
+        "  cellsAllOthers: GridCell = SPACE(reward: -1.0)\n",
+        "```\n",
+        "\n",
+        "As you can see, the size of the environment must be specified through the rowCount and colCount paramaters.\n",
+        "Using default parameter values, the environment will be surrounded by WALLs, and the other cells will be SPACE with a cost of -1.0.\n",
+        "You can change the default by specifying these parameters, for example surround it with BOUNCE by providing ```BOUNCE()``` instead of the default ```WALL()```.\n",
+        "\n",
+        "## Printing a Maze\n",
+        "After a maze has been created, it can be printed by calling printMazeAndTable"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "b-ZFX8K5OJ6H",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "mazeEnvExample.printMazeAndTable(header: \"Maze Environment\")"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "vKj75ioyTvyW",
+        "colab_type": "text"
+      },
+      "source": [
+        "This function prints:\n",
+        "* Maze with all cells and their rewards (WALL and BOUNCE do not have rewards, since they cannot be entered).\n",
+        "* Letter 'T' indicates that the cell is terminal\n",
+        "* A cell that contains '(JS)' means it has a JumpSpecification, which is detailed before the maze printout.\n",
+        "\n",
+        "### Printing V-/Q-Values and Policy\n",
+        "(if you are not familiar with these concepts, then please see the [OpenSpiel Swift tutorials](https://github.com/tensorflow/examples/blob/master/community/en/swift/open_spiel)).\n",
+        "\n",
+        "It is also possible to print a v-/q-/policy-table together with the maze.\n",
+        "\n",
+        "In the below code:\n",
+        "* **vtable** is a mapping from state to the value in that state\n",
+        "* **qtable** is a mapping from state to a value for each of the legal actions from that state\n",
+        "* **ptable** is a mapping from state to the action the policy takes from that state"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "NZchEFliUenc",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "var vtable = [String: Double]()\n",
+        "var qtable = [String: [(action: GridMaze.Action, qvalue: Double)]]()\n",
+        "var policyTable = [String: [GridMaze.Action]]()\n",
+        "mazeEnvExample.printMazeAndTable(header: \"--- Printing Everything\", vtable: vtable, qtable: qtable, ptable: policyTable)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "R1v6Eqe_WqaJ",
+        "colab_type": "text"
+      },
+      "source": [
+        "Since these tables do not contain any data for the cells\n",
+        "* **vtable** will print the value 0 for each state\n",
+        "* **qtable** will print a uniform distribution across the legal actions from each cell\n",
+        "* **ptable** will print '?' since there is no policy for any of the states\n",
+        "\n",
+        "## Creating Custom Cells\n",
+        "The cell types we've looked at so far (START, GOAL, SPACE, HOLE, BOUNCE) are all just functions that GridCell objects with certain parameters.\n",
+        "You can create your own cell behaviors by calling GridCell.init with the desired behavior.\n",
+        "See the GridMaze.swift source code to understand how this works.\n",
+        "\n",
+        "## Putting it all together\n",
+        "The file ```open_spiel/swift/Examples/GridMaze/main.swift``` is a complete example of a Robot walking randomly through the maze trying to reach the goal cell.\n",
+        "The [OpenSpiel Swift tutorials](https://github.com/tensorflow/examples/blob/master/community/en/swift/open_spiel) also  have more advanced examples, where the robot learns the optimal ways to navigate through the maze.\n",
+        "\n",
+        "## Join the community!\n",
+        "If you have any questions about Swift for TensorFlow, Swift in OpenSpiel, or would like to share\n",
+        "your work or research with the community, please join our mailing list\n",
+        "[`swift@tensorflow.org`](https://groups.google.com/a/tensorflow.org/forum/#!forum/swift).\n"
+      ]
+    }
+  ]
+}