Potential memory issue with tf_py_environment #8

ChengshuLi · 2019-02-13T04:47:26Z

Hi,

First of all, my environment is the following:
Tensorflow version: 1.13.0-dev20190205 (pip install tf-nightly-gpu)
tf-agents version: 0.2.0.dev20190123 (pip install tf-agents-nightly)
CUDA version: 10.0
cuDNN version: 7.4.1
Ubuntu version: 16.04

When I wrapped my customized python environment using tf_py_environment, it seemed to consume more and more cpu memory as time passed until the memory ran out and the program got stuck. This problem is particularly evident if my observation is large (say a RGB image or a large vector).

Here is a toy example:

import tensorflow as tf
from tf_agents.environments import tf_py_environment, py_environment
from tf_agents.specs import array_spec
from tf_agents.environments import time_step as ts
import numpy as np

img_size = 5000

class MyEnv(py_environment.Base):
    def __init__(self):
        self._action_spec = array_spec.BoundedArraySpec(shape=(), dtype=np.float32)
        self._observation_spec = array_spec.BoundedArraySpec(shape=(img_size, img_size, 3), dtype=np.float32)

    def action_spec(self):
        return self._action_spec

    def observation_spec(self):
        return self._observation_spec

    def reset(self):
        return ts.restart(np.zeros(shape=(img_size, img_size, 3), dtype=np.float32))

    def step(self, action):
        return ts.transition(np.zeros(shape=(img_size, img_size, 3), dtype=np.float32), reward=0.0, discount=1.0)

tf_py_env = MyEnv()
tf_env = tf_py_environment.TFPyEnvironment(tf_py_env)
i = 0
while True:
    if i % 10000 == 0:
        print(i)
        tf_env.reset()
    action = tf.constant([0.0])
    time_step = tf_env.step(action)
    i += 1

After a few minutes of running, it drained almost all the memory until the program got stuck. The last print out is 850000.

I have also run tf_agents/agents/dqn/examples/train_eval_atari.py for a while and it has the same symptom.

The memory fluctuated between 40% - 90% and due to the time / computing limit, I didn't get the chance to run it until convergence or crash / getting stuck.

In both cases, running the program makes my machine pretty slow. Is this expected?

I am very new to tf-agents so I suspect I did something wrong (maybe I am supposed to free memory somewhere in my code?). I would really appreciate if someone could point me to the right direction. Thanks!

Eric

The text was updated successfully, but these errors were encountered:

oars · 2019-02-13T16:54:43Z

Hi Eric,

Note that you are using a tf_environment. The way your code is structured you are generating Tensorflow Ops, but not evaluating them which is causing the increase in memory usage.

You'll want to change the later segment of your code to be:

tf_py_env = MyEnv()
tf_env = tf_py_environment.TFPyEnvironment(tf_py_env)
action = tf.constant([0.0])
reset_op = tf_env.reset()
step_op = tf_env.step(action)

i = 0

with tf.Session() as sess:
  while True:
      if i % 10000 == 0:
          print(i)
          time_step = sess.run(reset_op)
      time_step = sess.run(step_op)
      i += 1

ChengshuLi · 2019-02-13T17:46:02Z

@oars

Thank you so much for your help! It completely makes sense.

I think I was just following agents/tf_agents/colabs/environments_tutorial.ipynb without thinking too much about it and also overlooked tf.enable_eager_execution().

Thanks again.

ChengshuLi · 2019-02-13T19:42:58Z

@oars After running your code, I actually got an error:

RuntimeError: The Session graph is empty.  Add operations to the graph before calling run().

Do you know how I can fix this?

Also, as I mentioned above, when I ran the code tf_agents/agents/dqn/examples/train_eval_atari.py with no modification, the memory usage still fluctuated between 40% to 80+%, in which case my computer also becomes slower. Is that to be expected?

Thanks!

oars · 2019-02-19T18:30:42Z

Can you try updating tf-agents and trying again? I can't reproduce your error. For reference I just re-ran with this code:

import tensorflow as tf
from tf_agents.environments import tf_py_environment, py_environment
from tf_agents.specs import array_spec
from tf_agents.environments import time_step as ts
import numpy as np

img_size = 5000

class MyEnv(py_environment.Base):
    def __init__(self):
        self._action_spec = array_spec.BoundedArraySpec(shape=(), dtype=np.float32)
        self._observation_spec = array_spec.BoundedArraySpec(shape=(img_size, img_size, 3), dtype=np.float32)

    def action_spec(self):
        return self._action_spec

    def observation_spec(self):
        return self._observation_spec

    def _reset(self):
        return ts.restart(np.zeros(shape=(img_size, img_size, 3), dtype=np.float32))

    def _step(self, action):
        return ts.transition(np.zeros(shape=(img_size, img_size, 3), dtype=np.float32), reward=0.0, discount=1.0)

tf_py_env = MyEnv()
tf_env = tf_py_environment.TFPyEnvironment(tf_py_env)
action = tf.constant([0.0])
reset_op = tf_env.reset()
step_op = tf_env.step(action)

i = 0

with tf.Session() as sess:
  while True:
      if i % 100 == 0:
          print(i)
          time_step = sess.run(reset_op)
      time_step = sess.run(step_op)
      i += 1

Regarding the memory fluctuations I wouldn't expect that to happen either. Note that your image is fairly large ~280MB as raw float32 so if there are a couple of internal instances of it memory usage will be large.

ChengshuLi · 2019-02-20T19:58:18Z

It works!

I used tf.enable_eager_execution() before and it didn't work.
Sorry I should have spent more time understanding tf eager mode and this error should be obvious. Thanks a lot.

liujuncn · 2019-11-16T11:33:56Z

@oars How to deal with it in tf 2.0 without tf.Session ?

oars · 2019-11-18T15:59:07Z

How to deal with what? There are examples using environments in 2.0. Please look at the colabs.

ChengshuLi closed this as completed Feb 13, 2019

ChengshuLi reopened this Feb 13, 2019

oars closed this as completed Feb 13, 2019

oars self-assigned this Feb 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential memory issue with tf_py_environment #8

Potential memory issue with tf_py_environment #8

ChengshuLi commented Feb 13, 2019 •

edited

Loading

oars commented Feb 13, 2019

ChengshuLi commented Feb 13, 2019

ChengshuLi commented Feb 13, 2019 •

edited

Loading

oars commented Feb 19, 2019

ChengshuLi commented Feb 20, 2019

liujuncn commented Nov 16, 2019

oars commented Nov 18, 2019

Potential memory issue with tf_py_environment #8

Potential memory issue with tf_py_environment #8

Comments

ChengshuLi commented Feb 13, 2019 • edited Loading

oars commented Feb 13, 2019

ChengshuLi commented Feb 13, 2019

ChengshuLi commented Feb 13, 2019 • edited Loading

oars commented Feb 19, 2019

ChengshuLi commented Feb 20, 2019

liujuncn commented Nov 16, 2019

oars commented Nov 18, 2019

ChengshuLi commented Feb 13, 2019 •

edited

Loading

ChengshuLi commented Feb 13, 2019 •

edited

Loading