Skip to content
This repository has been archived by the owner on Nov 15, 2021. It is now read-only.

Commit

Permalink
Merge with ray master (#36)
Browse files Browse the repository at this point in the history
* [rllib] Remove dependency on TensorFlow (ray-project#4764)

* remove hard tf dep

* add test

* comment fix

* fix test

* Dynamic Custom Resources - create and delete resources (ray-project#3742)

* Update tutorial link in doc (ray-project#4777)

* [rllib] Implement learn_on_batch() in torch policy graph

* Fix `ray stop` by killing raylet before plasma (ray-project#4778)

* Fatal check if object store dies (ray-project#4763)

* [rllib] fix clip by value issue as TF upgraded (ray-project#4697)

*  fix clip_by_value issue

*  fix typo

* [autoscaler] Fix submit (ray-project#4782)

* Queue tasks in the raylet in between async callbacks (ray-project#4766)

* Add a SWAP TaskQueue so that we can keep track of tasks that are temporarily dequeued

* Fix bug where tasks that fail to be forwarded don't appear to be local by adding them to SWAP queue

* cleanups

* updates

* updates

* [Java][Bazel]  Refine auto-generated pom files (ray-project#4780)

* Bump version to 0.7.0 (ray-project#4791)

* [JAVA] setDefaultUncaughtExceptionHandler to log uncaught exception in user thread. (ray-project#4798)

* Add WorkerUncaughtExceptionHandler

* Fix

* revert bazel and pom

* [tune] Fix CLI test (ray-project#4801)

* Fix pom file generation (ray-project#4800)

* [rllib] Support continuous action distributions in IMPALA/APPO (ray-project#4771)

* [rllib] TensorFlow 2 compatibility (ray-project#4802)

* Change tagline in documentation and README. (ray-project#4807)

* Update README.rst, index.rst, tutorial.rst and  _config.yml

* [tune] Support non-arg submit (ray-project#4803)

* [autoscaler] rsync cluster (ray-project#4785)

* [tune] Remove extra parsing functionality (ray-project#4804)

* Fix Java worker log dir (ray-project#4781)

* [tune] Initial track integration (ray-project#4362)

Introduces a minimally invasive utility for logging experiment results. A broad requirement for this tool is that it should integrate seamlessly with Tune execution.

* [rllib] [RFC] Dynamic definition of loss functions and modularization support (ray-project#4795)

* dynamic graph

* wip

* clean up

* fix

* document trainer

* wip

* initialize the graph using a fake batch

* clean up dynamic init

* wip

* spelling

* use builder for ppo pol graph

* add ppo graph

* fix naming

* order

* docs

* set class name correctly

* add torch builder

* add custom model support in builder

* cleanup

* remove underscores

* fix py2 compat

* Update dynamic_tf_policy_graph.py

* Update tracking_dict.py

* wip

* rename

* debug level

* rename policy_graph -> policy in new classes

* fix test

* rename ppo tf policy

* port appo too

* forgot grads

* default policy optimizer

* make default config optional

* add config to optimizer

* use lr by default in optimizer

* update

* comments

* remove optimizer

* fix tuple actions support in dynamic tf graph

* [rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (ray-project#4819)

This implements some of the renames proposed in ray-project#4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.

* [Java] Dynamic resource API in Java (ray-project#4824)

* Add default values for Wgym flags

* Fix import

* Fix issue when starting `raylet_monitor` (ray-project#4829)

* Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID (ray-project#4776)

* Enable BaseId.

* Change TaskID and make python test pass

* Remove unnecessary functions and fix test failure and change TaskID to
16 bytes.

* Java code change draft

* Refine

* Lint

* Update java/api/src/main/java/org/ray/api/id/TaskId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/ObjectId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comment

* Lint

* Fix SINGLE_PROCESS

* Fix comments

* Refine code

* Refine test

* Resolve conflict

* Fix bug in which actor classes are not exported multiple times. (ray-project#4838)

* Bump Ray master version to 0.8.0.dev0 (ray-project#4845)

* Add section to bump version of master branch and cleanup release docs (ray-project#4846)

* Fix import

* Export remote functions when first used and also fix bug in which rem… (ray-project#4844)

* Export remote functions when first used and also fix bug in which remote functions and actor classes are not exported from workers during subsequent ray sessions.

* Documentation update

* Fix tests.

* Fix grammar

* Update wheel versions in documentation to 0.8.0.dev0 and 0.7.0. (ray-project#4847)

* [tune] Later expansion of local_dir (ray-project#4806)

* [rllib] [RFC] Deprecate Python 2 / RLlib (ray-project#4832)

* Fix a typo in kubernetes yaml (ray-project#4872)

* Move global state API out of global_state object. (ray-project#4857)

* Install bazel in autoscaler development configs. (ray-project#4874)

* [tune] Fix up Ax Search and Examples (ray-project#4851)

* update Ax for cleaner API

* docs update

* [rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section (ray-project#4821)

* wip

* fix index

* fix bugs

* todo

* add imports

* note on get ph

* note on get ph

* rename to building custom algs

* add rnn state info

* [rllib] Fix error getting kl when simple_optimizer: True in multi-agent PPO

* Replace ReturnIds with NumReturns in TaskInfo to reduce the size (ray-project#4854)

* Refine TaskInfo

* Fix

* Add a test to print task info size

* Lint

* Refine

* Update deps commits of opencensus to support building with bzl 0.25.x (ray-project#4862)

* Update deps to support bzl 2.5.x

* Fix

* Upgrade arrow to latest master (ray-project#4858)

* [tune] Auto-init Ray + default SearchAlg (ray-project#4815)

* Bump version from 0.8.0.dev0 to 0.7.1. (ray-project#4890)

* [rllib] Allow access to batches prior to postprocessing (ray-project#4871)

* [rllib] Fix Multidiscrete support (ray-project#4869)

* Refactor redis callback handling (ray-project#4841)

* Add CallbackReply

* Fix

* fix linting by format.sh

* Fix linting

* Address comments.

* Fix

* Initial high-level code structure of CoreWorker. (ray-project#4875)

* Drop duplicated string format (ray-project#4897)

This string format is unnecessary. java_worker_options has been appended to the commandline later.

* Refactor ID Serial 2: change all ID functions to `CamelCase` (ray-project#4896)

* Hotfix for change of from_random to FromRandom (ray-project#4909)

* [rllib] Fix documentation on custom policies (ray-project#4910)

* wip

* add docs

* lint

* todo sections

* fix doc

* [rllib] Allow Torch policies access to full action input dict in extra_action_out_fn (ray-project#4894)

* fix torch extra out

* preserve setitem

* fix docs

* [tune] Pretty print params json in logger.py (ray-project#4903)

* [sgd] Distributed Training via PyTorch (ray-project#4797)

Implements distributed SGD using distributed PyTorch.

* [rllib] Rough port of DQN to build_tf_policy() pattern (ray-project#4823)

* fetching objects in parallel in _get_arguments_for_execution (ray-project#4775)

* [tune] Disallow setting resources_per_trial when it is already configured (ray-project#4880)

* disallow it

* import fix

* fix example

* fix test

* fix tests

* Update mock.py

* fix

* make less convoluted

* fix tests

* [rllib] Rename PolicyEvaluator => RolloutWorker (ray-project#4820)

* Fix local cluster yaml (ray-project#4918)

* [tune] Directional metrics for components (ray-project#4120) (ray-project#4915)

* [Core Worker] implement ObjectInterface and add test framework (ray-project#4899)

* [tune] Make PBT Quantile fraction configurable (ray-project#4912)

* Better organize ray_common module (ray-project#4898)

* Fix error

* [tune] Add requirements-dev.txt and update docs for contributing (ray-project#4925)

* Add requirements-dev.txt and update docs.

* Update doc/source/tune-contrib.rst

Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>

* Unpin everything except for yapf.

* Fix compute actions return value

* Bump version from 0.7.1 to 0.8.0.dev1. (ray-project#4937)

* Update version number in documentation after release 0.7.0 -> 0.7.1 and 0.8.0.dev0 -> 0.8.0.dev1. (ray-project#4941)

* [doc] Update developer docs with bazel instructions (ray-project#4944)

* [C++] Add hash table to Redis-Module (ray-project#4911)

* Flush lineage cache on task submission instead of execution (ray-project#4942)

* [rllib] Add docs on how to use TF eager execution (ray-project#4927)

* [rllib] Port remainder of algorithms to build_trainer() pattern (ray-project#4920)

* Fix resource bookkeeping bug with acquiring unknown resource. (ray-project#4945)

* Update aws keys for uploading wheels to s3. (ray-project#4948)

* Upload wheels on Travis to branchname/commit_id. (ray-project#4949)

* [Java] Fix serializing issues of `RaySerializer` (ray-project#4887)

* Fix

* Address comment.

* fix (ray-project#4950)

* [Java] Add inner class `Builder` to build call options. (ray-project#4956)

* Add Builder class

* format

* Refactor by IDE

* Remove uncessary dependency

* Make release stress tests work and improve them. (ray-project#4955)

* Use proper session directory for debug_string.txt (ray-project#4960)

* [core] Use int64_t instead of int to keep track of fractional resources (ray-project#4959)

* [core worker] add task submission & execution interface (ray-project#4922)

* [sgd] Add non-distributed PyTorch runner (ray-project#4933)

* Add non-distributed PyTorch runner

* use dist.is_available() instead of checking OS

* Nicer exception

* Fix bug in choosing port

* Refactor some code

* Address comments

* Address comments

* Flush all tasks from local lineage cache after a node failure (ray-project#4964)

* Remove typing from setup.py install_requirements. (ray-project#4971)

* [Java] Fix bug of `BaseID` in multi-threading case. (ray-project#4974)

* [rllib] Fix DDPG example (ray-project#4973)

* Upgrade CI clang-format to 6.0 (ray-project#4976)

* [Core worker] add store & task provider (ray-project#4966)

* Fix bugs in the a3c code template. (ray-project#4984)

* Inherit Function Docstrings and other metedata (ray-project#4985)

* Fix a crash when unknown worker registering to raylet (ray-project#4992)

* [gRPC] Use gRPC for inter-node-manager communication (ray-project#4968)

* Fix Java CI failure (ray-project#4995)

* fix handling of non-integral timeout values in signal.receive (ray-project#5002)

* temp fix for build (ray-project#5006)

* [tune] Tutorial UX Changes (ray-project#4990)

* add integration, iris, ASHA, recursive changes, set reuse_actors=True, and enable Analysis as a return object

* docstring

* fix up example

* fix

* cleanup tests

* experiment analysis

* Fix valgrind build by installing new version of valgrind (ray-project#5008)

* Fix no cpus test (ray-project#5009)

* Fix tensorflow-1.14 installation in jenkins (ray-project#5007)

* Add dynamic worker options for worker command. (ray-project#4970)

* Add fields for fbs

* WIP

* Fix complition errors

* Add java part

* FIx

* Fix

* Fix

* Fix lint

* Refine API

* address comments and add test

* Fix

* Address comment.

* Address comments.

* Fix linting

* Refine

* Fix lint

* WIP: address comment.

* Fix java

* Fix py

* Refin

* Fix

* Fix

* Fix linting

* Fix lint

* Address comments

* WIP

* Fix

* Fix

* minor refine

* Fix lint

* Fix raylet test.

* Fix lint

* Update src/ray/raylet/worker_pool.h

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/runtime/src/main/java/org/ray/runtime/AbstractRayRuntime.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comments.

* Address comments.

* Fix test.

* Update src/ray/raylet/worker_pool.h

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comments.

* Address comments.

* Fix

* Fix lint

* Fix lint

* Fix

* Address comments.

* Fix linting

* [docs] docs for running Tensorboard without sudo (ray-project#5015)

* Instructions for running Tensorboard without sudo

When we run Tensorboard to visualize the results of Ray outputs on multi-user clusters where we don't have sudo access, such as RISE clusters, a few commands need to first be run to make sure tensorboard can edit the tmp directory. This is a pretty common usecase so I figured we may as well put it in the documentation for Tune.

* Update tune-usage.rst

* [ci] Change Jenkins to py3 (ray-project#5022)

* conda3

* integration

* add nevergrad, remotedata

* pytest 0.3.1

* otherdockers

* setup

* tune

* [gRPC] Migrate gcs data structures to protobuf (ray-project#5024)

* [rllib] Add QMIX mixer parameters to optimizer param list (ray-project#5014)

* add mixer params

* Update qmix_policy.py

* [grpc] refactor rpc server to support multiple io services (ray-project#5023)

* [rllib] Give error if sample_async is used with pytorch for A3C (ray-project#5000)

* give error if sample_async is used with pytorch

* update

* Update a3c.py

* [tune] Update MNIST Example (ray-project#4991)

* Add entropy coeff schedule

* Revert "Merge with ray master"

This reverts commit 108bfa2, reversing
changes made to 2e0eec9.

* Revert "Revert "Merge with ray master""

This reverts commit 92c0f88.

* Remove entropy decay stuff
  • Loading branch information
stefanpantic committed Jun 26, 2019
1 parent b850e14 commit 110aaab
Show file tree
Hide file tree
Showing 102 changed files with 2,330 additions and 2,048 deletions.
96 changes: 47 additions & 49 deletions BUILD.bazel
@@ -1,22 +1,55 @@
# Bazel build
# C/C++ documentation: https://docs.bazel.build/versions/master/be/c-cpp.html

load("@com_github_grpc_grpc//bazel:grpc_build_system.bzl", "grpc_proto_library")
load("@com_github_grpc_grpc//bazel:cc_grpc_library.bzl", "cc_grpc_library")
load("@build_stack_rules_proto//python:python_proto_compile.bzl", "python_proto_compile")
load("@com_github_google_flatbuffers//:build_defs.bzl", "flatbuffer_cc_library")
load("@//bazel:ray.bzl", "flatbuffer_py_library")
load("@//bazel:cython_library.bzl", "pyx_library")

COPTS = ["-DRAY_USE_GLOG"]

# Node manager gRPC lib.
grpc_proto_library(
name = "node_manager_grpc_lib",
# === Begin of protobuf definitions ===

proto_library(
name = "gcs_proto",
srcs = ["src/ray/protobuf/gcs.proto"],
visibility = ["//java:__subpackages__"],
)

cc_proto_library(
name = "gcs_cc_proto",
deps = [":gcs_proto"],
)

python_proto_compile(
name = "gcs_py_proto",
deps = [":gcs_proto"],
)

proto_library(
name = "node_manager_proto",
srcs = ["src/ray/protobuf/node_manager.proto"],
)

cc_proto_library(
name = "node_manager_cc_proto",
deps = ["node_manager_proto"],
)

# === End of protobuf definitions ===

# Node manager gRPC lib.
cc_grpc_library(
name = "node_manager_cc_grpc",
srcs = [":node_manager_proto"],
grpc_only = True,
deps = [":node_manager_cc_proto"],
)

# Node manager server and client.
cc_library(
name = "node_manager_rpc_lib",
name = "node_manager_rpc",
srcs = glob([
"src/ray/rpc/*.cc",
]),
Expand All @@ -25,7 +58,7 @@ cc_library(
]),
copts = COPTS,
deps = [
":node_manager_grpc_lib",
":node_manager_cc_grpc",
":ray_common",
"@boost//:asio",
"@com_github_grpc_grpc//:grpc++",
Expand Down Expand Up @@ -114,7 +147,7 @@ cc_library(
":gcs",
":gcs_fbs",
":node_manager_fbs",
":node_manager_rpc_lib",
":node_manager_rpc",
":object_manager",
":ray_common",
":ray_util",
Expand Down Expand Up @@ -422,9 +455,11 @@ cc_library(
"src/ray/gcs/format",
],
deps = [
":gcs_cc_proto",
":gcs_fbs",
":hiredis",
":node_manager_fbs",
":node_manager_rpc",
":ray_common",
":ray_util",
":stats_lib",
Expand Down Expand Up @@ -555,46 +590,6 @@ filegroup(
visibility = ["//java:__subpackages__"],
)

flatbuffer_py_library(
name = "python_gcs_fbs",
srcs = [
":gcs_fbs_file",
],
outs = [
"ActorCheckpointIdData.py",
"ActorState.py",
"ActorTableData.py",
"Arg.py",
"ClassTableData.py",
"ClientTableData.py",
"ConfigTableData.py",
"CustomSerializerData.py",
"DriverTableData.py",
"EntryType.py",
"ErrorTableData.py",
"ErrorType.py",
"FunctionTableData.py",
"GcsEntry.py",
"HeartbeatBatchTableData.py",
"HeartbeatTableData.py",
"Language.py",
"ObjectTableData.py",
"ProfileEvent.py",
"ProfileTableData.py",
"RayResource.py",
"ResourcePair.py",
"SchedulingState.py",
"TablePrefix.py",
"TablePubsub.py",
"TaskInfo.py",
"TaskLeaseData.py",
"TaskReconstructionData.py",
"TaskTableData.py",
"TaskTableTestAndUpdate.py",
],
out_prefix = "python/ray/core/generated/",
)

flatbuffer_py_library(
name = "python_node_manager_fbs",
srcs = [
Expand Down Expand Up @@ -679,6 +674,7 @@ cc_binary(
linkstatic = 1,
visibility = ["//java:__subpackages__"],
deps = [
":gcs_cc_proto",
":ray_common",
],
)
Expand All @@ -688,7 +684,7 @@ genrule(
srcs = [
"python/ray/_raylet.so",
"//:python_sources",
"//:python_gcs_fbs",
"//:gcs_py_proto",
"//:python_node_manager_fbs",
"//:redis-server",
"//:redis-cli",
Expand All @@ -710,11 +706,13 @@ genrule(
cp -f $(location //:raylet_monitor) $$WORK_DIR/python/ray/core/src/ray/raylet/ &&
cp -f $(location @plasma//:plasma_store_server) $$WORK_DIR/python/ray/core/src/plasma/ &&
cp -f $(location //:raylet) $$WORK_DIR/python/ray/core/src/ray/raylet/ &&
for f in $(locations //:python_gcs_fbs); do cp -f $$f $$WORK_DIR/python/ray/core/generated/; done &&
mkdir -p $$WORK_DIR/python/ray/core/generated/ray/protocol/ &&
for f in $(locations //:python_node_manager_fbs); do
cp -f $$f $$WORK_DIR/python/ray/core/generated/ray/protocol/;
done &&
for f in $(locations //:gcs_py_proto); do
cp -f $$f $$WORK_DIR/python/ray/core/generated/;
done &&
echo $$WORK_DIR > $@
""",
local = 1,
Expand Down
4 changes: 4 additions & 0 deletions bazel/ray_deps_build_all.bzl
Expand Up @@ -4,6 +4,8 @@ load("@com_github_jupp0r_prometheus_cpp//:repositories.bzl", "prometheus_cpp_rep
load("@com_github_ray_project_ray//bazel:python_configure.bzl", "python_configure")
load("@com_github_checkstyle_java//:repo.bzl", "checkstyle_deps")
load("@com_github_grpc_grpc//bazel:grpc_deps.bzl", "grpc_deps")
load("@build_stack_rules_proto//java:deps.bzl", "java_proto_compile")
load("@build_stack_rules_proto//python:deps.bzl", "python_proto_compile")


def ray_deps_build_all():
Expand All @@ -13,4 +15,6 @@ def ray_deps_build_all():
prometheus_cpp_repositories()
python_configure(name = "local_config_python")
grpc_deps()
java_proto_compile()
python_proto_compile()

11 changes: 9 additions & 2 deletions bazel/ray_deps_setup.bzl
Expand Up @@ -105,7 +105,14 @@ def ray_deps_setup():
http_archive(
name = "com_github_grpc_grpc",
urls = [
"https://github.com/grpc/grpc/archive/7741e806a213cba63c96234f16d712a8aa101a49.tar.gz",
"https://github.com/grpc/grpc/archive/76a381869413834692b8ed305fbe923c0f9c4472.tar.gz",
],
strip_prefix = "grpc-7741e806a213cba63c96234f16d712a8aa101a49",
strip_prefix = "grpc-76a381869413834692b8ed305fbe923c0f9c4472",
)

http_archive(
name = "build_stack_rules_proto",
urls = ["https://github.com/stackb/rules_proto/archive/b93b544f851fdcd3fc5c3d47aee3b7ca158a8841.tar.gz"],
sha256 = "c62f0b442e82a6152fcd5b1c0b7c4028233a9e314078952b6b04253421d56d61",
strip_prefix = "rules_proto-b93b544f851fdcd3fc5c3d47aee3b7ca158a8841",
)
Expand Up @@ -9,7 +9,7 @@ pushd "$ROOT_DIR"

python -m pip install pytest-benchmark

pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.8.0.dev1-cp27-cp27mu-manylinux1_x86_64.whl
pip install -U https://ray-wheels.s3-us-west-2.amazonaws.com/latest/ray-0.8.0.dev1-cp36-cp36m-manylinux1_x86_64.whl
python -m pytest --benchmark-autosave --benchmark-min-rounds=10 --benchmark-columns="min, max, mean" $ROOT_DIR/../../../python/ray/tests/perf_integration_tests/test_perf_integration.py

pushd $ROOT_DIR/../../../python
Expand Down
8 changes: 4 additions & 4 deletions ci/jenkins_tests/run_tune_tests.sh
Expand Up @@ -78,16 +78,16 @@ $SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE}
--smoke-test

# Runs only on Python3
# docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
# python3 /ray/python/ray/tune/examples/nevergrad_example.py \
# --smoke-test
$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/nevergrad_example.py \
--smoke-test

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/tune_mnist_keras.py \
--smoke-test

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/mnist_pytorch.py --smoke-test --no-cuda
python /ray/python/ray/tune/examples/mnist_pytorch.py --smoke-test

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/mnist_pytorch_trainable.py \
Expand Down
15 changes: 1 addition & 14 deletions doc/source/conf.py
Expand Up @@ -23,20 +23,7 @@
"gym.spaces",
"ray._raylet",
"ray.core.generated",
"ray.core.generated.ActorCheckpointIdData",
"ray.core.generated.ClientTableData",
"ray.core.generated.DriverTableData",
"ray.core.generated.EntryType",
"ray.core.generated.ErrorTableData",
"ray.core.generated.ErrorType",
"ray.core.generated.GcsEntry",
"ray.core.generated.HeartbeatBatchTableData",
"ray.core.generated.HeartbeatTableData",
"ray.core.generated.Language",
"ray.core.generated.ObjectTableData",
"ray.core.generated.ProfileTableData",
"ray.core.generated.TablePrefix",
"ray.core.generated.TablePubsub",
"ray.core.generated.gcs_pb2",
"ray.core.generated.ray.protocol.Task",
"scipy",
"scipy.signal",
Expand Down
6 changes: 6 additions & 0 deletions doc/source/tune-usage.rst
Expand Up @@ -355,6 +355,12 @@ Then, after you run a experiment, you can visualize your experiment with TensorB
$ tensorboard --logdir=~/ray_results/my_experiment
If you are running Ray on a remote multi-user cluster where you do not have sudo access, you can run the following commands to make sure tensorboard is able to write to the tmp directory:

.. code-block:: bash
$ export TMPDIR=/tmp/$USER; mkdir -p $TMPDIR; tensorboard --logdir=~/ray_results
.. image:: ray-tune-tensorboard.png

To use rllab's VisKit (you may have to install some dependencies), run:
Expand Down
2 changes: 1 addition & 1 deletion docker/base-deps/Dockerfile
Expand Up @@ -12,7 +12,7 @@ RUN apt-get update \
&& apt-get clean \
&& echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh \
&& wget \
--quiet 'https://repo.continuum.io/archive/Anaconda2-5.2.0-Linux-x86_64.sh' \
--quiet 'https://repo.continuum.io/archive/Anaconda3-5.2.0-Linux-x86_64.sh' \
-O /tmp/anaconda.sh \
&& /bin/bash /tmp/anaconda.sh -b -p /opt/conda \
&& rm /tmp/anaconda.sh \
Expand Down
5 changes: 4 additions & 1 deletion docker/examples/Dockerfile
Expand Up @@ -5,11 +5,14 @@ FROM ray-project/deploy
# This updates numpy to 1.14 and mutes errors from other libraries
RUN conda install -y numpy
RUN apt-get install -y zlib1g-dev
# The following is needed to support TensorFlow 1.14
RUN conda remove -y --force wrapt
RUN pip install gym[atari] opencv-python-headless tensorflow lz4 keras pytest-timeout smart_open
RUN pip install -U h5py # Mutes FutureWarnings
RUN pip install --upgrade bayesian-optimization
RUN pip install --upgrade git+git://github.com/hyperopt/hyperopt.git
RUN pip install --upgrade sigopt
# RUN pip install --upgrade nevergrad
RUN pip install --upgrade nevergrad
RUN pip install --upgrade scikit-optimize
RUN pip install -U pytest-remotedata>=0.3.1
RUN conda install pytorch-cpu torchvision-cpu -c pytorch
2 changes: 1 addition & 1 deletion docker/stress_test/Dockerfile
Expand Up @@ -4,7 +4,7 @@ FROM ray-project/base-deps

# We install ray and boto3 to enable the ray autoscaler as
# a test runner.
RUN pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.8.0.dev1-cp27-cp27mu-manylinux1_x86_64.whl boto3
RUN pip install -U https://ray-wheels.s3-us-west-2.amazonaws.com/latest/ray-0.8.0.dev1-cp36-cp36m-manylinux1_x86_64.whl boto3
RUN mkdir -p /root/.ssh/

# We port the source code in so that we run the most up-to-date stress tests.
Expand Down
11 changes: 8 additions & 3 deletions docker/tune_test/Dockerfile
Expand Up @@ -4,22 +4,27 @@ FROM ray-project/base-deps

# We install ray and boto3 to enable the ray autoscaler as
# a test runner.
RUN pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.8.0.dev1-cp27-cp27mu-manylinux1_x86_64.whl boto3
RUN conda install -y -c anaconda wrapt=1.11.1
RUN conda install -y -c anaconda numpy=1.16.4
RUN pip install -U https://ray-wheels.s3-us-west-2.amazonaws.com/latest/ray-0.8.0.dev1-cp36-cp36m-manylinux1_x86_64.whl boto3
# We install this after the latest wheels -- this should not override the latest wheels.
RUN apt-get install -y zlib1g-dev
# The following is needed to support TensorFlow 1.14
RUN conda remove -y --force wrapt
RUN pip install gym[atari]==0.10.11 opencv-python-headless tensorflow lz4 keras pytest-timeout smart_open
RUN pip install --upgrade bayesian-optimization
RUN pip install --upgrade git+git://github.com/hyperopt/hyperopt.git
RUN pip install --upgrade sigopt
# RUN pip install --upgrade nevergrad
RUN pip install --upgrade nevergrad
RUN pip install --upgrade scikit-optimize
RUN pip install -U pytest-remotedata>=0.3.1
RUN conda install pytorch-cpu torchvision-cpu -c pytorch

# RUN mkdir -p /root/.ssh/

# We port the source code in so that we run the most up-to-date stress tests.
ADD ray.tar /ray
ADD git-rev /ray/git-rev
RUN python /ray/python/ray/rllib/setup-rllib-dev.py --yes
RUN python /ray/python/ray/setup-dev.py --yes

WORKDIR /ray

0 comments on commit 110aaab

Please sign in to comment.