initial commit

uber-research · Jan 31, 2019 · 953ac7c · 953ac7c
commit 953ac7c
Show file tree

Hide file tree

Showing 24 changed files with 2,684 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,19 @@
+__pycache__
+*.pyc
+*~
+.idea
+results
+.ipynb_checkpoints
+.DS_Store
+demos_*
+*.png
+*.mp4
+to_do
+to_kill
+*.demo
+*.tar.gz
+*.jobs
+*.monitor.csv
+.unison*
+*.orig
+*~master
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,41 @@
+"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by the text below.
+
+"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
+
+"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
+
+"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
+
+"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
+
+"Work" shall mean the work of authorship, whether in Source or Object form, made available under this License.
+
+This License governs use of the accompanying Work, and your use of the Work constitutes acceptance of this License.
+
+You may use this Work for any non-commercial purpose, subject to the restrictions in this License. Some purposes which can be non-commercial are teaching, academic research, and personal experimentation. You may also distribute this Work with books or other teaching materials, or publish the Work on websites, that are intended to teach the use of the Work.
+
+You may not use or distribute this Work, or any derivative works, outputs, or results from the Work, in any form for commercial purposes. Non-exhaustive examples of commercial purposes would be running business operations, licensing, leasing, or selling the Work, or distributing the Work for use with commercial products.
+
+You may modify this Work and distribute the modified Work for non-commercial purposes, however, you may not grant rights to the Work or derivative works that are broader than or in conflict with those provided by this License. For example, you may not distribute modifications of the Work under terms that would permit commercial use, or under terms that purport to require the Work or derivative works to be sublicensed to others.
+
+In return, we require that you agree:
+
+1. Not to remove any copyright or other notices from the Work.
+
+2. That if you distribute the Work in Source or Object form, you will include a verbatim copy of this License.
+
+3. That if you distribute derivative works of the Work in Source form, you do so only under a license that includes all of the provisions of this License and is not in conflict with this License, and if you distribute derivative works of the Work solely in Object form you do so only under a license that complies with this License.
+
+4. That if you have modified the Work or created derivative works from the Work, and distribute such modifications or derivative works, you will cause the modified files to carry prominent notices so that recipients know that they are not receiving the original Work. Such notices must state: (i) that you have changed the Work; and (ii) the date of any changes.
+
+5. If you publicly use the Work or any output or result of the Work, you will provide a notice with such use that provides any person who uses, views, accesses, interacts with, or is otherwise exposed to the Work (i) with information of the nature of the Work, (ii) with a link to the Work, and (iii) a notice that the Work is available under this License.
+
+6. THAT THE WORK COMES "AS IS", WITH NO WARRANTIES. THIS MEANS NO EXPRESS, IMPLIED OR STATUTORY WARRANTY, INCLUDING WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE OR ANY WARRANTY OF TITLE OR NON-INFRINGEMENT. ALSO, YOU MUST PASS THIS DISCLAIMER ON WHENEVER YOU DISTRIBUTE THE WORK OR DERIVATIVE WORKS.
+
+7. THAT NEITHER UBER TECHNOLOGIES, INC. NOR ANY OF ITS AFFILIATES, SUPPLIERS, SUCCESSORS, NOR ASSIGNS WILL BE LIABLE FOR ANY DAMAGES RELATED TO THE WORK OR THIS LICENSE, INCLUDING DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL OR INCIDENTAL DAMAGES, TO THE MAXIMUM EXTENT THE LAW PERMITS, NO MATTER WHAT LEGAL THEORY IT IS BASED ON. ALSO, YOU MUST PASS THIS LIMITATION OF LIABILITY ON WHENEVER YOU DISTRIBUTE THE WORK OR DERIVATIVE WORKS.
+
+8. That if you sue anyone over patents that you think may apply to the Work or anyone's use of the Work, your license to the Work ends automatically.
+
+9. That your rights under the License end automatically if you breach it in any way.
+
+10. Uber Technologies, Inc. reserves all rights not expressly granted to you in this License.
diff --git a/README.md b/README.md
@@ -0,0 +1,72 @@
+# Go-Explore
+
+## Requirements
+
+Tested with Python 3.6. `requirements.txt` gives the exact libraries used on a test machine
+able to run all phases.
+
+**Required libraries for Phase 1:**
+- matplotlib
+- loky==2.3.1
+- dataclasses
+- tqdm
+- gym\[atari\]
+- opencv-python
+
+**Additional libraries for demo generation:**
+- imageio
+- fire
+
+Additionally, to run `gen_demo`, you will need to clone [openai/atari-demo](https://github.com/openai/atari-demo) and
+put a copy or link of the subfolder `atari_demo` at `gen_demo/atari_demo` in this codebase.
+
+E.g. you could run:
+
+`git clone https://github.com/openai/atari-demo`
+
+`cp -r atari-demo/atari_demo gen_demo`
+
+**Additional libraries for Phase 2:**
+- tensorflow-gpu
+- pandas
+- horovod
+- baselines
+
+Additionally, to run Phase 2, you will need to clone [uber-research/atari-reset](https://github.com/uber-research/atari-reset) (note: this is an improved fork of the original project, which you can find at [openai/atari-reset](https://github.com/openai/atari-reset)) and
+put it, copy it or link to it as `atari_reset` in the root folder for this project.
+E.g. you could run:
+
+`git clone https://github.com/uber-research/atari-reset atari_reset`
+
+## Usage
+
+Running Phase 1 of Go-Explore can be done using the `phase1.sh` script. To see the arguments
+for Phase 1, run:
+
+`./phase1.sh --help` 
+
+The default arguments for Phase 1 will run a domain knowledge version of Go-Explore Phase 1 on
+Montezuma's Revenge. However, the default parameters do not correspond to any experiment actually
+presented in the paper. To reproduce Phase 1 experiments from the paper, run one of
+`./phase1_montezuma_domain.sh`, `./phase1_montezuma_no_domain.sh` or `./phase1_pitfall_domain.sh`.
+
+Phase 1 produces a folder called `results`, and subfolders for each experiment, of the form
+`0000_fb6be589a3dc44c1b561336e04c6b4cb`, where the first element is an automatically increasing
+experiment id and the second element is a random string that helps prevent race condition issues if
+two experiments are started at the same time and assigned the same id.
+
+To generate demonstrations, call `./gen_demo.sh <phase1_result_folder> <destination> --game <game>`. Where `<game>` is one of "montezuma" (default) or "pitfall". The destination
+will be a directory containing a `.demo` file and a `.mp4` file corresponding to the video of the
+demonstration.
+
+To robustify (run Phase 2), put a set of `.demo` files from different runs of Phase 1 into a folder
+(we used 10 for Montezuma and 4 for Pitfall, a single demonstration can also work, but is less
+likely to succeed). Then run `./phase2.sh <game> <demo_folder> <results_folder>` where the game is 
+one of `MontezumaRevenge` or `Pitfall`. This should work with `mpirun` if you are using distributed 
+training (we used 16 GPUs). The indicator of success for Phase 2 is when one of the 
+`max_starting_point` displayed in the log has reached a value near 0 (values less than around 80 are
+typically good). You may then test the performance of your trained neural network using 
+`./phase2_test.sh <game> <neural_net> <test_results_folder>`
+where <neural_net> is one of the files produced by Phase 2 and printed in the log as `Saving to ...`.
+This will produce `.json` files for each possible number of no-ops (from 0 to 30) with scores, levels
+and exact action sequences produced by the test runs.
diff --git a/gen_demo.sh b/gen_demo.sh
@@ -0,0 +1,11 @@
+#!/bin/sh
+# Copyright (c) 2018-2019 Uber Technologies, Inc.
+#
+# Licensed under the Uber Non-Commercial License (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at the root directory of this project.
+#
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+python gen_demo/main.py "$@"
diff --git a/gen_demo/__init__.py b/gen_demo/__init__.py
@@ -0,0 +1,8 @@
+# Copyright (c) 2018-2019 Uber Technologies, Inc.
+#
+# Licensed under the Uber Non-Commercial License (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at the root directory of this project.
+#
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/gen_demo/main.py b/gen_demo/main.py
@@ -0,0 +1,196 @@
+# Copyright (c) 2018-2019 Uber Technologies, Inc.
+#
+# Licensed under the Uber Non-Commercial License (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at the root directory of this project.
+#
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import imageio
+from PIL import Image, ImageFont, ImageDraw
+
+import gen_demo.atari_demo as atari_demo
+import gen_demo.atari_demo.wrappers
+
+from goexplore_py.goexplore import *
+# from atari_reset.atari_reset.wrappers import Image, MyResizeFrame, WarpFrame
+
+sys.modules['env'] = sys.modules['goexplore_py.montezuma_env']
+
+import fire
+
+FOLDER = None
+DESTINATION = None
+FRAME_SKIP = 4
+
+NUM_PIXELS = 8.0
+# NUM_PIXELS = 16.0
+
+class RenamingUnpickler(pickle.Unpickler):
+    def find_class(self, module, name):
+        if module in ['basic',
+                      'explorers',
+                      'goexplore',
+                      'import_ai',
+                      'montezuma_env',
+                      'pitfall_env',
+                      'randselectors',
+                      'utils']:
+            module = 'goexplore_py.' + module
+        return super().find_class(module, name)
+
+
+def my_resize_frame(obs, res):
+    obs = np.array(Image.fromarray(obs).resize((res[0], res[1]), resample=Image.BILINEAR), dtype=np.uint8)
+    return obs.reshape(res)
+
+
+def convert_state(state, shape):
+    import cv2
+    frame = (((cv2.resize(cv2.cvtColor(state, cv2.COLOR_RGB2GRAY), (11, 8), interpolation=cv2.INTER_AREA) / 255.0) * NUM_PIXELS).astype(np.uint8) * (255.0 / NUM_PIXELS)).astype(np.uint8)
+    # print("shape:", shape, frame.shape)
+    frame = np.transpose(frame)
+    frame = cv2.resize(frame, dsize=shape, interpolation=cv2.INTER_NEAREST)
+    # print("shape:", frame.shape)
+    frame = np.transpose(frame)
+    return frame
+
+
+def with_domain_knowledge(key):
+    return not isinstance(key, tuple)
+
+
+class ScoreTrajectories:
+    def __init__(self, chosen_demos, data, max_level=float('inf'), max_trajectory=float('inf'), max_score=float('inf')):
+        self.chosen_demos = chosen_demos
+        self.data = data
+        self.select_longest_trajectory = False
+        self.max_level = max_level
+        self.max_trajectory = max_trajectory
+        self.max_score = max_score
+
+    def compute_similarity_weight(self, cell):
+        weight = 1.0
+        for k in self.chosen_demos:
+            cell2 = self.data[k]
+            total = 0
+            different = 0
+            if len(cell.trajectory) == 0 or len(cell2.trajectory) == 0:
+                continue
+            for a1, a2 in zip(cell.trajectory, cell2.trajectory):
+                if not isinstance(a1, tuple) and a1.from_.exact.level >= max_level and a2.from_.exact.level >= max_level:
+                    break
+                total += 1
+                different += a1.action != a2.action
+            weight = min(weight, different / total)
+
+        return weight
+
+    def __call__(self, key):
+        cell = self.data[key]
+
+        weight = self.compute_similarity_weight(cell)
+
+        if with_domain_knowledge(key):
+            if self.select_longest_trajectory:
+                return weight * key.level, weight * cell.score, cell.trajectory_len * weight
+            else:
+                return weight * key.level, weight * cell.score, -cell.trajectory_len * weight
+        else:
+            if self.select_longest_trajectory:
+                return cell.real_cell.level * weight, cell.score * weight, cell.trajectory_len * weight
+            else:
+                return cell.real_cell.level * weight, cell.score * weight, -cell.trajectory_len * weight
+
+
+def run(folder, destination, max_level=None, max_trajectory=None, max_score=None, game="montezuma", stop_on_score=False, n_demos=1):
+    global FOLDER, DESTINATION
+    FOLDER = folder
+    DESTINATION = destination
+    if game == "montezuma":
+        gym_game = 'MontezumaRevengeNoFrameskip-v4'
+    elif game == "pitfall":
+        gym_game = 'PitfallNoFrameskip-v4'
+    else:
+        raise NotImplementedError("Unknown game: " + game)
+
+    file = max(e for e in glob.glob(FOLDER + '/*.7z') if '_set' not in e)
+    print(file)
+    print('size =', len(lzma.open(file).read()))
+    data = RenamingUnpickler(lzma.open(file)).load()
+
+    os.makedirs(destination, exist_ok=True)
+
+    chosen_demos = []
+
+    if max_level is None:
+        max_level = float('inf')
+    if max_trajectory is None:
+        max_trajectory = float('inf')
+    if max_score is None:
+        max_score = float('inf')
+
+    # Experimental: truncate all trajectories to fit the desired criteria
+    print("Cell information:")
+    for key in data.keys():
+        cell = data[key]
+        cum_reward = 0
+        trajectory_length = 0
+        highest_reward = 0
+        highest_reward_trajectory_length = 0
+        for e in cell.trajectory:
+            cum_reward += e.reward
+            trajectory_length += 1
+            if cum_reward > highest_reward:
+                highest_reward = cum_reward
+                highest_reward_trajectory_length = trajectory_length
+            if trajectory_length >= max_trajectory:
+                break
+            if cum_reward >= max_score:
+                break
+        cell.score = highest_reward
+        cell.trajectory_len = highest_reward_trajectory_length
+        cell.trajectory = cell.trajectory[0:highest_reward_trajectory_length]
+
+    for idx in range(n_demos):
+        key = max(data.keys(), key=ScoreTrajectories(chosen_demos, data, max_level, max_trajectory, max_score))
+        if with_domain_knowledge(key):
+            if key.level < max_level and game != "pitfall":
+                print(f'WARNING: Level {max_level} not solved (max={key.level})')
+            list_of_actions = [e.action for e in data[key].trajectory if e.to.exact.level < max_level]
+
+        else:
+            list_of_actions = [e.action for e in data[key].trajectory]
+
+        if hasattr(data[key].real_cell, 'level'):
+            print('Chosen - score:', data[key].score, "length:", data[key].trajectory_len, 'level:', data[key].real_cell.level)
+        else:
+            print('Chosen - score:', data[key].score, "length:", data[key].trajectory_len)
+        chosen_demos.append(key)
+
+        env = gym.make(gym_game)
+        env = atari_demo.wrappers.AtariDemo(env)
+        env.reset()
+        frames = [env.render(mode='rgb_array').repeat(2, axis=1)]
+        total = 0
+
+        for a in [0] * 3 + list_of_actions:
+            # done = False
+            for _ in range(4):
+                _, reward, done, _ = env.step(a)
+                frame = env.render(mode='rgb_array').repeat(2, axis=1)
+                frames.append(np.array(frame))
+
+                total += reward
+
+        frames += [frames[-1]] * 200
+        print("Created demo:", len(frames), total, data[key].score)
+
+        env.save_to_file(DESTINATION + f'/{idx}.demo')
+
+        imageio.mimsave(DESTINATION + f'/{idx}.mp4', frames[::7], fps=24)
+
+
+if __name__ == '__main__':
+    fire.Fire(run)