Remove lab instructions from readme, updating update notes, environment

formatting.
tensorforce · Jan 6, 2018 · 8147353 · 8147353
1 parent 10c8b4e
commit 8147353
Show file tree

Hide file tree

Showing 7 changed files with 120 additions and 101 deletions.
diff --git a/BUILD b/BUILD
@@ -2,17 +2,17 @@ package(default_visibility = ["//visibility:public"])
 
 tensorforce_args = [
   "--agent VPGAgent",
-  "--agent-config /configs/vpg_agent.json",
-  "--network-config /configs/vpg_network_visual.json",
+  "--agent-config /configs/vpg_baseline_visual.json",
+  "--network-config /configs/cnn_dqn_network.json",
   "--episodes 1000",
   "--max-timesteps 1000"
 ]
 
 py_library(
     name = "tensorforce",
     imports = [":tensorforce"],
-    data = ["//tensorforce:examples/configs/vpg_agent.json",
-    "//tensorforce:examples/configs/vpg_network_visual.json"],
+    data = ["//tensorforce:examples/configs/vpg_baseline_visual.json",
+    "//tensorforce:examples/configs/cnn_dqn_network.json"],
     srcs = glob(["tensorforce/**/*.py"])
 )
 

diff --git a/README.md b/README.md
@@ -13,23 +13,21 @@ TensorForce is an open source reinforcement learning library focused on
 providing clear APIs, readability and modularisation to deploy
 reinforcement learning solutions both in research and practice.
 TensorForce is built on top of TensorFlow and compatible with Python 2.7
-and &ge;3.5 and supports multiple state inputs and multi-dimensional
-actions to be compatible with Gym, Universe, and DeepMind lab. It further
-provides an easily extensible interface to implement new environments.
+and &gt;3.5 and supports multiple state inputs and multi-dimensional
+actions to be compatible with any type of simulation or application environment.
 
-Finally, TensorForce aims to move all reinforcement learning logic into the
+TensorForce also aims to move all reinforcement learning logic into the
 TensorFlow graph, including control flow. This both reduces dependencies
 on the host language (Python), thus enabling portable computation graphs that
 can be used in other languages and  contexts, and improves performance.
 
 More information on architecture can also be found [on our blog](https://reinforce.io/blog/).
 Please also read the [TensorForce FAQ](https://github.com/reinforceio/tensorforce/blob/master/FAQ.md)
-if you encounter problems or have questions. Finally, you can sign up to our [Gitter channel](https://docs.google.com/forms/d/1_UD5Pb5LaPVUviD0pO0fFcEnx_vwenvuc00jmP2rRIc/).
+if you encounter problems or have questions.
 
 Finally, read the latest update notes (UPDATE_NOTES.md) for an idea of
 how the project is evolving, especially concerning majorAPI breaking updates.
 
-
 The main difference to existing libraries is a strict separation of
 environments, agents and update logic that facilitates usage in
 non-simulation environments. Further, research code often relies on
@@ -51,7 +49,7 @@ Features
 --------
 
 TensorForce currently integrates with the OpenAI Gym API, OpenAI
-Universe, the Unreal Engine (game engine), DeepMind lab, ALE and Maze explorer. The following algorithms are available (all
+Universe, DeepMind lab, ALE and Maze explorer. The following algorithms are available (all
 policy methods both continuous/discrete and using a Beta distribution for bounded actions). 
 
 -  A3C using distributed TensorFlow or a multithreaded runner - now as part of our generic Model
@@ -201,53 +199,15 @@ Please refer to the [tensorforce-benchmark](https://github.com/reinforceio/tenso
 for more information.
 
 
-Use with DeepMind lab
----------------------
-
-Since DeepMind lab is only available as source code, a manual install
-via bazel is required. Further, due to the way bazel handles external
-dependencies, cloning TensorForce into lab is the most convenient way to
-run it using the bazel BUILD file we provide. To use lab, first download
-and install it according to instructions
-<https://github.com/deepmind/lab/blob/master/docs/build.md>:
-
-```bash
-git clone https://github.com/deepmind/lab.git
-```
-
-Add to the lab main BUILD file:
-
-```
-package(default_visibility = ["//visibility:public"])
-```
-
-Clone TensorForce into the lab directory, then run the TensorForce bazel runner. Note that using any specific configuration file
-currently requires changing the Tensorforce BUILD file to adjust environment parameters.
-
-```bash
-bazel run //tensorforce:lab_runner
-```
-
-Please note that we have not tried to reproduce any lab results yet, and
-these instructions just explain connectivity in case someone wants to
-get started there.
-
-
-Community and contribution guidelines
--------------------------------------
+Community and contributions
+---------------------------
 
 TensorForce is developed by [reinforce.io](https://reinforce.io), a new
 project focused on providing reinforcement learning software
 infrastructure. For any questions, get in touch at
 <contact@reinforce.io>.
 
 Please file bug reports and feature discussions as GitHub issues in first instance.
-Please read the FAQ before creating an issue.
-
-Please appreciate that we do not have the resources to help you find the right configuration
-for your problem, so unless you are reasonably convinced there is a bug (e.g. by testing known hyper-parameters),
-please do not create issues such as 'Algorithm X is not working on environment Y with Configuration Z' without
-showing you have done some research (again, please read the FAQ on why).
 
 There is also a developer chat you are welcome to join. For joining, we ask to provide
 some basic details how you are using TensorForce so we can learn more about applications and our
@@ -270,5 +230,5 @@ If you use TensorForce in your academic research, we would be grateful if you co
 ```
 
 We are also very grateful for our open source contributors (listed according to github): Islandman93, wassname, 
-lefnire, Mazecreator, trickmeyer, mryellow, ImpulseAdventure, vwxyzjn, beflix, tms1337, BorisSchaeling, ngoodger,
+Mazecreator, lefnire, sven1977, trickmeyer, mryellow, ImpulseAdventure, vwxyzjn, beflix, tms1337, BorisSchaeling, ngoodger,
 ekerazha, Davidnet, nikoliazekter, AdamStelmaszczyk, 10nagachika, petrbel, Kismuz.
diff --git a/UPDATE_NOTES.md b/UPDATE_NOTES.md
@@ -5,6 +5,16 @@ This file tracks all major updates and new features. As TensorForce is still in
 we are continuously implementing small updates and bug fixes, which will not
 be tracked here in detail but through github issues.
 
+6th January
+
+- In December, a number of bugs regarding exploration and a numberical issue in generalised 
+  advantage estimation were fixed which seem to increase performance so an update is recommended.
+- Agent structure saw major refactoring to remove redundant code, introduced a ```LearningAgent```
+  to hold common fields and distinguish from non-learning agents (e.g. ```RandomAgent``)
+- We are preparing to move memories into the TensorFlow graph which will fix sequences and allow subsampling
+  in the optimizers. Further, new episode/batch semantics will be enabled (e.g. episode based instead of
+  timestep based batching). 
+
 9th December 2017
 
 - Renamed LSTM to InternalLSTM and created a new LSTM layer which implements more standard

diff --git a/tensorforce/contrib/ale.py b/tensorforce/contrib/ale.py
@@ -55,20 +55,20 @@ def __init__(self, rom, frame_skip=1, repeat_action_probability=0.0,
         self.ale.setBool(b'color_averaging', False)
         self.ale.setInt(b'frame_skip', frame_skip)
 
-        # all set commands must be done before loading the ROM
+        # All set commands must be done before loading the ROM
         self.ale.loadROM(rom.encode())
 
-        # setup gamescreen object
+        # Setup gamescreen object
         width, height = self.ale.getScreenDims()
         self.gamescreen = np.empty((height, width, 3), dtype=np.uint8)
 
         self.frame_skip = frame_skip
 
-        # setup action converter
+        # Setup action converter
         # ALE returns legal action indexes, convert these to just numbers
         self.action_inds = self.ale.getMinimalActionSet()
 
-        # setup lives
+        # Setup lives
         self.loss_of_life_reward = loss_of_life_reward
         self.cur_lives = self.ale.lives()
         self.loss_of_life_termination = loss_of_life_termination
@@ -84,15 +84,15 @@ def reset(self):
         self.ale.reset_game()
         self.cur_lives = self.ale.lives()
         self.life_lost = False
-        # clear gamescreen
+        # Clear gamescreen
         self.gamescreen = np.empty(self.gamescreen.shape, dtype=np.uint8)
         return self.current_state
 
     def execute(self, actions):
-        # convert action to ale action
+        # Convert action to ale action
         ale_actions = self.action_inds[actions]
 
-        # get reward and process terminal & next state
+        # Get reward and process terminal & next state
         rew = self.ale.act(ale_actions)
         if self.loss_of_life_termination or self.loss_of_life_reward != 0:
             new_lives = self.ale.lives()
@@ -128,8 +128,23 @@ def is_terminal(self):
     @property
     def action_names(self):
         action_names = [
-            'No-Op', 'Fire', 'Up', 'Right', 'Left', 'Down', 'Up Right', 'Up Left', 'Down Right',
-            'Down Left', 'Up Fire', 'Right Fire', 'Left Fire', 'Down Fire', 'Up Right Fire',
-            'Up Left Fire', 'Down Right Fire', 'Down Left Fire'
+            'No-Op',
+            'Fire',
+            'Up',
+            'Right',
+            'Left',
+            'Down',
+            'Up Right',
+            'Up Left',
+            'Down Right',
+            'Down Left',
+            'Up Fire',
+            'Right Fire',
+            'Left Fire',
+            'Down Fire',
+            'Up Right Fire',
+            'Up Left Fire',
+            'Down Right Fire',
+            'Down Left Fire'
         ]
         return np.asarray(action_names)[self.action_inds]
diff --git a/tensorforce/contrib/deepmind_lab.py b/tensorforce/contrib/deepmind_lab.py
@@ -27,27 +27,39 @@ class DeepMindLab(Environment):
     DeepMind Lab Integration:
     https://arxiv.org/abs/1612.03801
     https://github.com/deepmind/lab
+
+    Since DeepMind lab is only available as source code, a manual install
+    via bazel is required. Further, due to the way bazel handles external
+    dependencies, cloning TensorForce into lab is the most convenient way to
+    run it using the bazel BUILD file we provide. To use lab, first download
+    and install it according to instructions
+    <https://github.com/deepmind/lab/blob/master/docs/build.md>:
+
+    ```bash
+    git clone https://github.com/deepmind/lab.git
+    ```
+
+    Add to the lab main BUILD file:
+
+    ```
+    package(default_visibility = ["//visibility:public"])
+    ```
+
+    Clone TensorForce into the lab directory, then run the TensorForce bazel runner.
+
+    Note that using any specific configuration file currently requires changing the Tensorforce
+    BUILD file to adjust environment parameters.
+
+    ```bash
+    bazel run //tensorforce:lab_runner
+    ```
+
+    Please note that we have not tried to reproduce any lab results yet, and
+    these instructions just explain connectivity in case someone wants to
+    get started there.
+
+
     """
-    #
-    # @staticmethod
-    # def state_spec(level_id):
-    #     """
-    #     Returns a list of dicts with keys 'dtype', 'shape' and 'name', specifying the available observations this DeepMind Lab environment supports.
-    #
-    #     :param level_id: string with id/descriptor of the level
-    #     """
-    #     level = deepmind_lab.Lab(level_id, ())
-    #     return level.observation_spec()
-    #
-    # @staticmethod
-    # def action_spec(level_id):
-    #     """
-    #     Returns a list of dicts with keys 'min', 'max' and 'name', specifying the shape of the actions expected by this DeepMind Lab environment.
-    #
-    #     :param level_id: string with id/descriptor of the level
-    #     """
-    #     level = deepmind_lab.Lab(level_id, ())
-    #     return level.action_spec()
 
     def __init__(
         self,

diff --git a/tensorforce/contrib/state_settable_environment.py b/tensorforce/contrib/state_settable_environment.py
@@ -18,14 +18,16 @@
 
 class StateSettableEnvironment(Environment):
     """
-    An Environment that implements the set_state method to set the current state to some new state using setter instructions.
+    An Environment that implements the set_state method to set the current state
+    to some new state using setter instructions.
     """
     def set_state(self, **kwargs):
         """
         Sets the current state of the environment manually to some other state and returns a new observation.
 
         Args:
-            **kwargs: The set instruction(s) to be executed by the environment. A single set instruction usually set a single property of the
+            **kwargs: The set instruction(s) to be executed by the environment.
+                       A single set instruction usually set a single property of the
                       state/observation vector to some new value.
         Returns: The observation dictionary of the Environment after(!) setting it to the new state.
         """