Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Dockerfile assembler system #24051

Merged
merged 4 commits into from
Dec 1, 2018

Conversation

angerson
Copy link
Contributor

This is a big upgrade to the Dockerfile assembler I wrote a couple of
months ago. The spec has changed, the script has been rewritten, and
there are new features throughout:

  • The assembler can build and upload images to Docker Hub.
  • The assembler can also run tests (!), although the testing system is
    extremely rudimentary. It could be expanded with parallelism later, if
    execution time becomes a problem.
  • spec.yml is totally different, and now defines both dockerfiles and
    images. It handles the combinatorial explosion of multiple optional features
    without excessive duplication, unlike the previous spec format.
  • Partials are the same, but I dumped the extensive dockerfile
    documentation support because I don't think anyone would have used it.
  • Dockerfiles are handled under the same kind of system as images, which
    is neat. The new Dockerfiles aren't so duplicated.
  • I've upgraded the images with new tensorflow tutorial files (jupyter
    only) and fixed some others that didn't actually work.
  • I've improved the development documentation by suggesting aliases.
  • Added "static-dockerfiles" directory to track independent Dockerfiles.

These changes should better support changes like #23194.

@angerson
Copy link
Contributor Author

The full suite of versioned images are available at https://hub.docker.com/r/angersson/tensorflow/tags/, e.g.:

docker run -it --rm --runtime=nvidia -u $(id -u):$(id -g) angersson/tensorflow:zeus-gpu bash

This is a big upgrade to the Dockerfile assembler I wrote a couple of
months ago. The spec has changed, the script has been rewritten, and
there are new features throughout:

- The assembler can build and upload images to Docker Hub.
- The assembler can also run tests (!), although the testing system is
  extremely rudimentary. It could be expanded with parallelism later, if
  execution time becomes a problem.
- spec.yml is totally different, and now defines both dockerfiles and
  images. It handles the combinatorial explosion of multiple optional features
  without excessive duplication, unlike the previous spec format.
- Partials are the same, but I dumped the extensive dockerfile
  documentation support because I don't think anyone would have used it.
- Dockerfiles are handled under the same kind of system as images, which
  is neat. The new Dockerfiles aren't so duplicated.
- I've upgraded the images with new tensorflow tutorial files (jupyter
  only) and fixed some others that didn't actually work.
- I've improved the development documentation by suggesting aliases.
- Added "static-dockerfiles" directory to track independent Dockerfiles.

These changes should better support changes like tensorflow#23194.
gunan
gunan previously approved these changes Nov 30, 2018
Copy link
Contributor

@gunan gunan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. Some terminology and function names can be better, but apart from one no big objections.

@angerson
Copy link
Contributor Author

Thanks for the review. I fixed the problems you noted, plus a few more small issues.

@angerson angerson added the ready to pull PR ready for merge process label Nov 30, 2018
@tensorflow-copybara tensorflow-copybara merged commit 9152bfc into tensorflow:master Dec 1, 2018
tensorflow-copybara pushed a commit that referenced this pull request Dec 1, 2018
@angerson angerson mentioned this pull request Dec 3, 2018
@angerson angerson deleted the angerson-tagger branch December 3, 2018 18:51
@maystroh
Copy link

maystroh commented Dec 4, 2018

I created a singularity image using this command:

singularity build angersson_gpu.simg docker://angersson/tensorflow:zeus-gpu-py3

But when I launch this script:

import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
    c = tf.matmul(a, b)
with tf.Session() as sess:
    print (sess.run(c))

it gives me this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/volume/test_gpu.py", line 1, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

@angerson
Copy link
Contributor Author

angerson commented Dec 4, 2018

@maystroh Are you using the nvidia-docker2 runtime to launch the GPU images? I don't know how singularity works, but that looks like the same kind of failure that occurs when you don't use --runtime=nvidia.

@maystroh
Copy link

maystroh commented Dec 4, 2018

It works when I add --nv to my singularity command. Thanks @angersson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes ready to pull PR ready for merge process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants