Skip to content

Commit

Permalink
Better docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Zachary Marion committed Dec 29, 2018
1 parent fe05560 commit f3fde7e
Show file tree
Hide file tree
Showing 36 changed files with 495 additions and 39 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ test:
python -m pytest

docs:
sphinx-build -b html sphynx docs
sphinx-build -b html sphinx docs
Binary file modified docs/.doctrees/api/agents.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/core.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/games.doctree
Binary file not shown.
Binary file modified docs/.doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/.doctrees/getting-started.doctree
Binary file not shown.
Binary file modified docs/.doctrees/index.doctree
Binary file not shown.
8 changes: 8 additions & 0 deletions docs/_modules/agents/agent.html
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,14 @@ <h1>Source code for agents.agent</h1><div class="highlight"><pre>
<div class="viewcode-block" id="Agent.action"><a class="viewcode-back" href="../../api/agents.html#agents.Agent.action">[docs]</a> <span class="k">def</span> <span class="nf">action</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">s</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span>
<span class="sd">&#39;&#39;&#39;</span>
<span class="sd"> Given a game, a state of the game, return an action</span>

<span class="sd"> Args:</span>
<span class="sd"> g (Game): The game the agent is competing in</span>
<span class="sd"> s (any): The state of the game</span>
<span class="sd"> p (int): The current player (either 0 or 1)</span>

<span class="sd"> Returns:</span>
<span class="sd"> int: The index of the action within the returned action space</span>
<span class="sd"> &#39;&#39;&#39;</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span></div></div>
</pre></div>
Expand Down
13 changes: 13 additions & 0 deletions docs/_modules/agents/mcts.html
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,16 @@ <h1>Source code for agents.mcts</h1><div class="highlight"><pre>
<span class="sd"> by propagating whether or not the current player won the game back up through</span>
<span class="sd"> the game history. After enough iterations of game simulations we can choose</span>
<span class="sd"> the best move based on this stored information</span>

<span class="sd"> Attributes:</span>
<span class="sd"> wins (dict): A dictionary where the key is a tuple :code:`(player, state_hash)`</span>
<span class="sd"> and the value is the number of wins that occurred at that state for the</span>
<span class="sd"> player. Note that the player represents whoever *played* the move in the state.</span>
<span class="sd"> plays (dict): A dictionary of the same format as wins which represents the</span>
<span class="sd"> number of times the player made a move in the given state</span>

<span class="sd"> Examples:</span>
<span class="sd"> &gt;&gt;&gt; MCTSAgent().train(game, num_iters=10000, num_episodes=100, verbose=True)</span>
<span class="sd"> &#39;&#39;&#39;</span>

<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
Expand All @@ -177,6 +187,9 @@ <h1>Source code for agents.mcts</h1><div class="highlight"><pre>
<span class="sd"> counts for any state that we visit during the game. As we continue to</span>
<span class="sd"> play, num_wins / num_plays for a given state should begin to converge on</span>
<span class="sd"> the true optimality of a state</span>

<span class="sd"> Args:</span>
<span class="sd"> g (Game): Game to train on</span>
<span class="sd"> &#39;&#39;&#39;</span>
<span class="n">num_iters</span> <span class="o">=</span> <span class="n">kwargs</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;num_iters&#39;</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">num_episodes</span> <span class="o">=</span> <span class="n">kwargs</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;num_episodes&#39;</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
Expand Down
14 changes: 13 additions & 1 deletion docs/_modules/agents/trainable_agent.html
Original file line number Diff line number Diff line change
Expand Up @@ -162,20 +162,32 @@ <h1>Source code for agents.trainable_agent</h1><div class="highlight"><pre>

<div class="viewcode-block" id="TrainableAgent.train"><a class="viewcode-back" href="../../api/agents.html#agents.TrainableAgent.train">[docs]</a> <span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="sd">&#39;&#39;&#39;</span>
<span class="sd"> Train the agent. As a convenience this should return self.training_params()</span>
<span class="sd"> Train the agent. As a convenience this should return :code:`self.training_params()`</span>
<span class="sd"> at the end of training</span>

<span class="sd"> Args:</span>
<span class="sd"> g (Game): The game the agent is training on</span>

<span class="sd"> Returns:</span>
<span class="sd"> tuple: The training params of the agent</span>
<span class="sd"> &#39;&#39;&#39;</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span></div>

<div class="viewcode-block" id="TrainableAgent.train_episode"><a class="viewcode-back" href="../../api/agents.html#agents.TrainableAgent.train_episode">[docs]</a> <span class="k">def</span> <span class="nf">train_episode</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="sd">&#39;&#39;&#39;</span>
<span class="sd"> Single training iteration</span>

<span class="sd"> Args:</span>
<span class="sd"> g (Game): The game the agent is training on</span>
<span class="sd"> &#39;&#39;&#39;</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span></div>

<div class="viewcode-block" id="TrainableAgent.training_params"><a class="viewcode-back" href="../../api/agents.html#agents.TrainableAgent.training_params">[docs]</a> <span class="k">def</span> <span class="nf">training_params</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">g</span><span class="p">):</span>
<span class="sd">&#39;&#39;&#39;</span>
<span class="sd"> Return the params that result from training</span>

<span class="sd"> Args:</span>
<span class="sd"> g (Game): The game the agent is training on</span>
<span class="sd"> &#39;&#39;&#39;</span>
<span class="k">raise</span> <span class="ne">NotImplementedError</span></div></div>
</pre></div>
Expand Down
43 changes: 35 additions & 8 deletions docs/_modules/core/arena.html
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@
<div itemprop="articleBody">

<h1>Source code for core.arena</h1><div class="highlight"><pre>
<span></span><span class="kn">import</span> <span class="nn">random</span>
<span></span><span class="kn">from</span> <span class="nn">random</span> <span class="k">import</span> <span class="n">choice</span>

<span class="kn">from</span> <span class="nn">.player</span> <span class="k">import</span> <span class="n">Player</span>

Expand All @@ -157,12 +157,25 @@ <h1>Source code for core.arena</h1><div class="highlight"><pre>
<span class="sd">&#39;&#39;&#39;</span>
<span class="sd"> Place where two agents are pitted against eachother in a series of games.</span>
<span class="sd"> Statistics on the win rates are recorded and can be displayed.</span>

<span class="sd"> Attributes:</span>
<span class="sd"> game (Game): The game that is being played</span>
<span class="sd"> players (list): List of Player objects. Note that there should only be two, and</span>
<span class="sd"> the ids of the player should map to the index of the player in the array.</span>
<span class="sd"> games_played (int): The number of games played in the arena</span>
<span class="sd"> wins (list): List of two integers representing the number of wins of each player,</span>
<span class="sd"> with the index being the id of the player</span>
<span class="sd"> &#39;&#39;&#39;</span>

<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">game</span><span class="p">,</span> <span class="n">players</span><span class="p">):</span>
<span class="sd">&#39;&#39;&#39;</span>
<span class="sd"> The `players` argument is a list of players to be used. In the future, when</span>
<span class="sd"> more than two players are supported this can be generalized to n players.</span>
<span class="sd"> Note:</span>
<span class="sd"> The `players` argument is a list of players to be used. In the future, when</span>
<span class="sd"> more than two players are supported this can be generalized to n players.</span>

<span class="sd"> Args:</span>
<span class="sd"> game (Game)</span>
<span class="sd"> players (list)</span>
<span class="sd"> &#39;&#39;&#39;</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">all</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">Player</span><span class="p">)</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">players</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">&#39;Expected `model` argument to be a list of &#39;</span>
Expand All @@ -178,6 +191,11 @@ <h1>Source code for core.arena</h1><div class="highlight"><pre>
<span class="sd">&#39;&#39;&#39;</span>
<span class="sd"> Play a series of games between the players, recording how they did</span>
<span class="sd"> so that we can display statistics on which player performed better</span>

<span class="sd"> Args:</span>
<span class="sd"> num_episodes (int): The number of games to play, defaults to 10</span>
<span class="sd"> verbose (bool): Whether or not to print output from each game.</span>
<span class="sd"> Defaults to false</span>
<span class="sd"> &#39;&#39;&#39;</span>
<span class="n">num_episodes</span> <span class="o">=</span> <span class="n">kwargs</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;num_episodes&#39;</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
<span class="n">verbose</span> <span class="o">=</span> <span class="n">kwargs</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;verbose&#39;</span><span class="p">,</span> <span class="kc">False</span><span class="p">)</span>
Expand Down Expand Up @@ -206,13 +224,22 @@ <h1>Source code for core.arena</h1><div class="highlight"><pre>
<span class="sd"> Play a single game, doing the necessary bookkeeping to maintain</span>
<span class="sd"> accurate statistics and returning the winner (or -1 if no winner).</span>

<span class="sd"> NOTE: We always have the start with player being 0 from the persepctive</span>
<span class="sd"> of the agent. Because of this we pass in a &#39;flip&#39; boolean to the player</span>
<span class="sd"> class in the action method, which flips the board and makes it seems as</span>
<span class="sd"> though player 0 started, even if it was actually player 1</span>
<span class="sd"> Note:</span>
<span class="sd"> We always have the start with player being 0 from the persepctive</span>
<span class="sd"> of the agent. Because of this we pass in a :code:`flip` boolean to</span>
<span class="sd"> the player class in the action method, which flips the board and</span>
<span class="sd"> makes it seems as though player 0 started, even if it was actually</span>
<span class="sd"> player 1</span>

<span class="sd"> Args:</span>
<span class="sd"> verbose (bool): Whether or not to print the output of the game.</span>
<span class="sd"> Defaults to false</span>

<span class="sd"> Returns:</span>
<span class="sd"> int: The winner of the game</span>
<span class="sd"> &#39;&#39;&#39;</span>
<span class="n">state</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">game</span><span class="o">.</span><span class="n">initial_state</span><span class="p">()</span>
<span class="n">starting_player</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">([</span><span class="n">p</span><span class="o">.</span><span class="n">player_id</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">players</span><span class="p">])</span>
<span class="n">starting_player</span> <span class="o">=</span> <span class="n">choice</span><span class="p">([</span><span class="n">p</span><span class="o">.</span><span class="n">player_id</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">players</span><span class="p">])</span>
<span class="n">player</span> <span class="o">=</span> <span class="n">starting_player</span>

<span class="c1"># Play out the full game</span>
Expand Down
18 changes: 18 additions & 0 deletions docs/_modules/core/player.html
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,13 @@ <h1>Source code for core.player</h1><div class="highlight"><pre>
<span class="sd"> agent that backs it. Agents learn the optimal play for each player,</span>
<span class="sd"> while players are only concerned about the optimal play for</span>
<span class="sd"> themselves</span>

<span class="sd"> Attributes:</span>
<span class="sd"> player_id (int): The id of the player</span>
<span class="sd"> agent (Agent): The agent associated with the player</span>

<span class="sd"> Raises:</span>
<span class="sd"> ValueError: If the id is not 0 or 1</span>
<span class="sd"> &#39;&#39;&#39;</span>

<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">player_id</span><span class="p">,</span> <span class="n">agent</span><span class="p">):</span>
Expand All @@ -167,6 +174,17 @@ <h1>Source code for core.player</h1><div class="highlight"><pre>
<span class="sd"> Take an action with the backing agent. If the starting player is</span>
<span class="sd"> not 0, then we invert the board so that the starting player is still</span>
<span class="sd"> 0 from the perspective of the agent</span>

<span class="sd"> Args:</span>
<span class="sd"> g (Game): The game the player is playing</span>
<span class="sd"> s (any): The state of the game</span>
<span class="sd"> flip (bool): Whether or not to flip the state so that the agent</span>
<span class="sd"> thinks that player 0 started the game. This is necessary since</span>
<span class="sd"> trainable agents like MCTSAgent operate under the assumption that</span>
<span class="sd"> player 0 always starts</span>

<span class="sd"> Returns:</span>
<span class="sd"> int: The index of the action the player will take</span>
<span class="sd"> &#39;&#39;&#39;</span>
<span class="n">state</span> <span class="o">=</span> <span class="n">g</span><span class="o">.</span><span class="n">flip_state</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="k">if</span> <span class="n">flip</span> <span class="k">else</span> <span class="n">s</span>
<span class="n">player</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">player_id</span> <span class="k">if</span> <span class="n">flip</span> <span class="k">else</span> <span class="bp">self</span><span class="o">.</span><span class="n">player_id</span>
Expand Down
6 changes: 6 additions & 0 deletions docs/_modules/games/game.html
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,12 @@

<h1>Source code for games.game</h1><div class="highlight"><pre>
<div class="viewcode-block" id="Game"><a class="viewcode-back" href="../../api/games.html#games.Game">[docs]</a><span></span><span class="k">class</span> <span class="nc">Game</span><span class="p">:</span>
<span class="sd">&#39;&#39;&#39;</span>
<span class="sd"> Game class, which is extended to implement different types of adversarial,</span>
<span class="sd"> zero sum games. The class itself is stateless and all methods are actually</span>
<span class="sd"> static.</span>
<span class="sd"> &#39;&#39;&#39;</span>

<div class="viewcode-block" id="Game.initial_state"><a class="viewcode-back" href="../../api/games.html#games.Game.initial_state">[docs]</a> <span class="k">def</span> <span class="nf">initial_state</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="sd">&#39;&#39;&#39;</span>
<span class="sd"> Return the initial state of the game</span>
Expand Down
11 changes: 11 additions & 0 deletions docs/_modules/games/tictactoe.html
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,17 @@ <h1>Source code for games.tictactoe</h1><div class="highlight"><pre>
<span class="sd"> Implements a 3x3 game of tictactoe, with state represented as an array of length 9.</span>
<span class="sd"> Currently the implementation is somewhat brittle and cannot be extended to an nxn</span>
<span class="sd"> board easily.</span>

<span class="sd"> Examples:</span>
<span class="sd"> &gt;&gt;&gt; TicTacToe().initial_state()</span>
<span class="sd"> [-1, -1, -1, -1, -1, -1, -1, -1, -1]</span>

<span class="sd"> &gt;&gt;&gt; TicTacToe().to_readable_string([-1, 1, -1, 0, 0, -1, -1, 1, -1])</span>
<span class="sd"> | O |</span>
<span class="sd"> -----------</span>
<span class="sd"> X | X |</span>
<span class="sd"> -----------</span>
<span class="sd"> | O |</span>
<span class="sd"> &#39;&#39;&#39;</span>

<span class="k">def</span> <span class="nf">initial_state</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
Expand Down
3 changes: 0 additions & 3 deletions docs/_modules/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -151,15 +151,12 @@ <h1>All modules for which code is available</h1>
<li><a href="agents/limited_depth_minimax.html">agents.limited_depth_minimax</a></li>
<li><a href="agents/mcts.html">agents.mcts</a></li>
<li><a href="agents/minimax.html">agents.minimax</a></li>
<li><a href="agents/random.html">agents.random</a></li>
<li><a href="agents/random_agent.html">agents.random_agent</a></li>
<li><a href="agents/trainable_agent.html">agents.trainable_agent</a></li>
<li><a href="core/arena.html">core.arena</a></li>
<li><a href="core/player.html">core.player</a></li>
<li><a href="games/game.html">games.game</a></li>
<li><a href="games/tictactoe.html">games.tictactoe</a></li>
<li><a href="trainers/mcts.html">trainers.mcts</a></li>
<li><a href="trainers/trainer.html">trainers.trainer</a></li>
</ul>

</div>
Expand Down
6 changes: 3 additions & 3 deletions docs/_sources/getting-started.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ Basic Example

.. code-block:: python
from games import TicTacToe
from agents import RandomAgent, MCTSAgent
from core import Arena, Player
from gameai.games import TicTacToe
from gameai.agents import RandomAgent, MCTSAgent
from gameai.core import Arena, Player
# Create our game
game = TicTacToe()
Expand Down

0 comments on commit f3fde7e

Please sign in to comment.