Utils rewrite #10

ludwigschubert · 2018-02-02T00:27:12Z

@colah @znah this is a rewrite of unified io.

It doesn't introduce a lot of new functionality, but it does disentangle our prior abstractions a little:

If that looks obvious—good. :D

Does come with tests, but thorough testing is tough as we'll need to try the writing and reading functionality in various environments like /bigstore, /cns, on which I can't run automated tests.

I'd love to have a discussion about naming at this point. I went with what felt obvious, but I'm in no way stuck on these names. I also want to discuss what we should expose to users by default. I have put suggested functions into the __init__.py file.

Let me know your thoughts! :-)

colah

This looks like a great start! There's a lot of things I like, such as the general separation of load/read and save/write.

That said, I also have some questions and concerns. I've eft these as comments in line.

colah · 2018-02-02T00:50:09Z

lucid/util/show.py

+from lucid.util.array_to_image import _infer_domain_from_array, _normalize_array_and_convert_to_image
+
+
+_last_html_output = None


What's the purpose of all these _last_*_output variables?

It's an idea @znah introduced (with an explanatory comment that I must have removed) to allow testing these methods without needing to mock IPython.display. I may alternatively just attempt to mock out that dependency during testing.

colah · 2018-02-02T00:55:01Z

lucid/util/show.py

+    pass
+
+
+def _image_url(image, fmt='png', mode="data"):


It feels a bit weird to be passing PIL.Image objects around. From the perspective of lucid, they're this really weird object, that we're only invoking as an intermediary to convert NumPy arrays to image formats. I'd rather have them be entirely encapsulated, lest the abstraction leak.

That's fair, though if we want resizing it seems like a better option than staying in numpy and adding all of scipy for just resizing. I'll take a stab at this with numpy arrays as "canonicalized" images. At the end, though, we will need either an easily serializable format or already serialized data (see your comment about not wanting to pass around serialized imaeg data either) to pass around if we want to avoid duplication. An image needs to be serialized in some way for both display and saving/writing.

colah · 2018-02-02T00:56:35Z

lucid/util/show.py

+    _last_html_output = html_str
+    IPythonDisplay(IPythonHTML(html_str))
+
+  def _display_data(image_data, format):


I don't find this function name descriptive: We're displaying an image, given data. Relatedly, does it make sense for image data to be something we pass around like this?

More broadly, it seems like there's a lot of abstraction in this code that's coming from the theory that it's better to use IPythonImage over data urls when possible. Do we know that's actually the case? If so, are the gains worth the abstraction overhead?

I really like your last point. I don't know! I will look into it. It just seemed that the data URL was an additional roundtrip, but I will check if IPython.display.Image doesn't do the same thing…

I will quietly go into a corner and concede your point—IPython.display.Image just converts to data URLs as well.

So data-urls it is from now on; that simplifies things in show.

colah · 2018-02-02T01:01:04Z

lucid/util/show.py

+  rank, shape and dtype. rank 4 tensors will be displayed as image grids, rank 2
+  and 3 tensors as images.
+  """
+  if isinstance(thing, np.ndarray):


It should be valid for thing to be a list of NumPy arrays. This is important for two reasons:

(1) Often each image gets generated separately and added to a list, so one would have to deliberately convert them to a NumPy array.
(2) Sometimes the images are of different sizes (eg. activation magnitudes at different layers). This has traditionally been a big pain point, that the images abstraction nicely solves.

In the long run, we may also want to support grids of images and other such things...

Oh, totally! I'll build that in. In general, this is supposed to define the interface and make it reasonably easy and predictable to extend it.

Addressed in e3b6c8f.

colah · 2018-02-02T01:03:30Z

lucid/util/show.py

+    IPythonDisplay(IPythonImage(data=image_data, format=format))
+
+except ImportError:
+  logging.warn('IPython is not present, HTML output from lucid.util.show and '


Is this accounting for the IPython libraries not being installed, or us not being in an IPython environment?

If the former, is there any reason for the IPython libraries to not be installed? (They'll be part of our package requirements, presumably?)

If the latter, I don't think the import test accomplishes this. I also think IPython's libraries may be smart enough to handle the issue for us.

I hear you. At the heart of the issue is a blurring of our tensorflow/mathematical utilities and some helpers for using the above interactively. I will think more about this; one way may be to decouple optvis and things like show entirely. (But, of course, we still want them to be nice and easy to use in a notebook environment without much configuration!)

colah · 2018-02-02T01:16:05Z

lucid/util/array_to_image.py

+
+  low, high = np.min(array), np.max(array)
+  if low < domain[0] or high > domain[1]:
+    message = "Clipping domain from (~{:.2f}, ~{:.2f}) to (~{:.2f}, ~{:.2f})"


These messages are really helpful for catching errors, but can also get annoying. Maybe we should consider having a way to turn off "verbose"? Either as a function flag or global. Maybe that's sufficiently handled by logging -- I don't have any experience with it.

Yeah, logging has a switch for that! We may want to set the threshold higher by default ourselves? Otherwise the standard way to do this in python is:

import logging logging.getLogger().setLevel(logging.INFO)

We could also use a custom logger? I'll look into this more and propose a way for these messages to be hidden by default but easily enabled. :-)

Adressed in e3b6c8f.

Logging can now be configured at the module level & default logging level is set to WARN.

Default log level is now scoped to our module and can be set like this:

import logging logging.getLogger('lucid').setLevel(logging.INFO)

I included some documentation in the lucid module's __init__ file, but we may want to move this explanation to a readme or FAQ in the future.
(Though it seems to follow Python best practices, I didn't know about the one-logger-per module either.)

colah · 2018-02-02T01:21:04Z

lucid/util/load.py

+from lucid.util.read import reading
+
+
+def load_npy(handle):


It seems weird to me for these to take a handle. I kind of liked that, in the old version, they could be used to override inferring the type from the extension. But on reflection, I never used it.

If we intend for these to take handles, maybe make them private functions (eg. _load_npy). The present version seems potentially confusing to users.

It also seems a bit inconsistent with save_npy(object, url) below.

You're right; these should be private. Savers take handles because more often than not that's what the libraries expect. It's not perfect as this breaks the serialize/write abstraction a little bit.

I believe this can be solved by making those functions private, and potentially adding back load_npy(path) convenience methods should we need them. I believe our goal should be to never need to explicitly call these, though.

Making private mostly resolves this. Maybe also change their names so that they don't look like duals to save_*() functions? _save_to_handle_npy() is a bit clunky, but meh.

Also, just want to check that you're aware of cstringio. :)

Addressed in 04cc1cc by making loaders private for now. If we ever need them to be public as a fallback I will make sure they follow the same interface as save and take a path/url directly.

Yes! I am aware of cStringIO. Unfortunately it's a Python2-only thing
(towards the bottom of the list).

colah · 2018-02-02T01:24:09Z

lucid/util/load.py

+    return result
+  else:
+    message = "Unknown extension '{}', supports {}."
+    raise RuntimeError(message.format(ext, loaders))


"supported" and loaders.keys() maybe?

Yeah, that was laziness on my part. .keys() is one of those expressions that's not compatible across python versions, but I will simply use list(…). (Source, and generally handy cheat sheet for compatibility)

Addressed in 04cc1cc.

colah · 2018-02-02T01:26:37Z

lucid/util/read.py

+
+
+@contextmanager
+def reading(path, mode=None):


It appears that reading() only supports gfile right now. Does that mean that load() only supports what gfile does at present?

Correct, but merely WIP. reading/read will have feature parity before I suggest this be merged.

Addressed in 4a91554.

colah · 2018-02-02T01:27:52Z

lucid/util/write.py

+def _supports_make_dirs(path):
+  """Whether this path implies a storage system that supports and requires
+  intermediate directories to be created explicitly."""
+  return not path.startswith("/bigstore")


Note, GFS and bigstore are the same thing.

colah · 2018-02-02T03:12:35Z

Where is the line between utils and misc? Should gradient_override be in utils?

ludwigschubert · 2018-02-02T18:59:11Z

/misc vs /utils.

I wanted to start separating infrastructure/glue code from, you know, "actual math code". I'm not set on the naming, but I felt that utils for utility methods made sense, and that misc would be a place for things like your convenience wrapper for dimensionality reduction, etc.

…g behavior.

(Adds invariance to extension CAPITALIZATION and tries opening unknown extensions as images.)

…r read.

znah · 2018-02-02T16:34:05Z

lucid/util/array_to_image.py

+  """
+  rank = len(shape)
+
+  if rank == 2:


PIL.Image.fromarray() implements this logic already (except the case ndim=3, depth=1)

Huh, thanks for letting me know! The single image with non-squeezed depth = 1 case tripped me up in a notebook some timer ago, but I'll see if squeezing such dimensions allows us to get rid of this method altogether!

znah · 2018-02-02T16:35:16Z

lucid/util/show.py

+    raise ValueError("Unsupported mode '%s'", mode)
+
+
+def _data_from_image(image, fmt='png', quality=95):


_encode_image()?

…for show using mocks, caching support

ludwigschubert · 2018-02-05T20:00:36Z

From sync with @colah:

Tasks concerning logging and interactive usage:

Logging needs may change per function call; how can users specify that? (module-wide log levels may not be precise enough)
"log" images—same or different system? custom log handler? etc.
image output may be surprising, why not display stats about image on hover etc.

Additional considerations

PIL has max image size; can we think of a way around this?
"data-url" as a save target; you save and get back a URL somehow?
less wide interface for turning an arbitrary array into data
can load be made to deal with protobufs?
can save be made to deal with protobufs?

…g by avoiding it, more structure for tests

ludwigschubert · 2018-02-06T03:06:30Z

Moving the remaining ToDos here to issues.

# Conflicts: # notebooks/local_development.ipynb

Ludwig Schubert added 2 commits February 1, 2018 16:13

Rewrite of util module with tests

abd2df5

Minor setup changes

f4b30a7

ludwigschubert added the help wanted Extra attention is needed label Feb 2, 2018

ludwigschubert requested review from colah and znah February 2, 2018 00:27

colah reviewed Feb 2, 2018

View reviewed changes

Ludwig Schubert added 3 commits February 2, 2018 17:01

Add hierarchical logging, sets default log level and documents loggin…

e3b6c8f

…g behavior.

Make loaders private and improve load()

04cc1cc

(Adds invariance to extension CAPITALIZATION and tries opening unknown extensions as images.)

Bring read and reading to permanent feature parity. Adds tests fo…

4a91554

…r read.

znah reviewed Feb 5, 2018

View reviewed changes

No more passing around IPython images or straight data, better tests …

42903b6

…for show using mocks, caching support

Move io to lucid.misc.io, enable show in render, fix text mode readin…

148449a

…g by avoiding it, more structure for tests

Merge branch 'master' into utils-rewrite

46d913d

# Conflicts: # notebooks/local_development.ipynb

ludwigschubert merged commit 0429933 into master Feb 6, 2018

ludwigschubert deleted the utils-rewrite branch February 6, 2018 03:11

ludwigschubert mentioned this pull request Feb 6, 2018

show images during render_vis() process #7

Closed

		from lucid.util.array_to_image import _infer_domain_from_array, _normalize_array_and_convert_to_image


		_last_html_output = None

		raise ValueError("Unsupported mode '%s'", mode)


		def _data_from_image(image, fmt='png', quality=95):

Utils rewrite #10

Utils rewrite #10

Conversation

ludwigschubert commented Feb 2, 2018

colah left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ludwigschubert Feb 3, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ludwigschubert Feb 3, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colah commented Feb 2, 2018

ludwigschubert commented Feb 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ludwigschubert commented Feb 5, 2018 • edited

Tasks concerning logging and interactive usage:

Additional considerations

ludwigschubert commented Feb 6, 2018

ludwigschubert Feb 3, 2018 •

edited

ludwigschubert Feb 3, 2018 •

edited

ludwigschubert commented Feb 5, 2018 •

edited