-
Notifications
You must be signed in to change notification settings - Fork 650
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start! There's a lot of things I like, such as the general separation of load/read and save/write.
That said, I also have some questions and concerns. I've eft these as comments in line.
lucid/util/show.py
Outdated
from lucid.util.array_to_image import _infer_domain_from_array, _normalize_array_and_convert_to_image | ||
|
||
|
||
_last_html_output = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of all these _last_*_output
variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an idea @znah introduced (with an explanatory comment that I must have removed) to allow testing these methods without needing to mock IPython.display
. I may alternatively just attempt to mock out that dependency during testing.
lucid/util/show.py
Outdated
pass | ||
|
||
|
||
def _image_url(image, fmt='png', mode="data"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit weird to be passing PIL.Image
objects around. From the perspective of lucid, they're this really weird object, that we're only invoking as an intermediary to convert NumPy arrays to image formats. I'd rather have them be entirely encapsulated, lest the abstraction leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, though if we want resizing it seems like a better option than staying in numpy and adding all of scipy for just resizing. I'll take a stab at this with numpy arrays as "canonicalized" images. At the end, though, we will need either an easily serializable format or already serialized data (see your comment about not wanting to pass around serialized imaeg data either) to pass around if we want to avoid duplication. An image needs to be serialized in some way for both display and saving/writing.
lucid/util/show.py
Outdated
_last_html_output = html_str | ||
IPythonDisplay(IPythonHTML(html_str)) | ||
|
||
def _display_data(image_data, format): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find this function name descriptive: We're displaying an image, given data. Relatedly, does it make sense for image data to be something we pass around like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More broadly, it seems like there's a lot of abstraction in this code that's coming from the theory that it's better to use IPythonImage over data urls when possible. Do we know that's actually the case? If so, are the gains worth the abstraction overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like your last point. I don't know! I will look into it. It just seemed that the data URL was an additional roundtrip, but I will check if IPython.display.Image
doesn't do the same thing…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will quietly go into a corner and concede your point—IPython.display.Image just converts to data URLs as well.
So data-urls
it is from now on; that simplifies things in show.
lucid/util/show.py
Outdated
rank, shape and dtype. rank 4 tensors will be displayed as image grids, rank 2 | ||
and 3 tensors as images. | ||
""" | ||
if isinstance(thing, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be valid for thing
to be a list of NumPy arrays. This is important for two reasons:
(1) Often each image gets generated separately and added to a list, so one would have to deliberately convert them to a NumPy array.
(2) Sometimes the images are of different sizes (eg. activation magnitudes at different layers). This has traditionally been a big pain point, that the images abstraction nicely solves.
In the long run, we may also want to support grids of images and other such things...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, totally! I'll build that in. In general, this is supposed to define the interface and make it reasonably easy and predictable to extend it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in e3b6c8f.
lucid/util/show.py
Outdated
IPythonDisplay(IPythonImage(data=image_data, format=format)) | ||
|
||
except ImportError: | ||
logging.warn('IPython is not present, HTML output from lucid.util.show and ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accounting for the IPython libraries not being installed, or us not being in an IPython environment?
If the former, is there any reason for the IPython libraries to not be installed? (They'll be part of our package requirements, presumably?)
If the latter, I don't think the import test accomplishes this. I also think IPython's libraries may be smart enough to handle the issue for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear you. At the heart of the issue is a blurring of our tensorflow/mathematical utilities and some helpers for using the above interactively. I will think more about this; one way may be to decouple optvis
and things like show
entirely. (But, of course, we still want them to be nice and easy to use in a notebook environment without much configuration!)
lucid/util/array_to_image.py
Outdated
|
||
low, high = np.min(array), np.max(array) | ||
if low < domain[0] or high > domain[1]: | ||
message = "Clipping domain from (~{:.2f}, ~{:.2f}) to (~{:.2f}, ~{:.2f})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These messages are really helpful for catching errors, but can also get annoying. Maybe we should consider having a way to turn off "verbose"? Either as a function flag or global. Maybe that's sufficiently handled by logging -- I don't have any experience with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, logging has a switch for that! We may want to set the threshold higher by default ourselves? Otherwise the standard way to do this in python is:
import logging
logging.getLogger().setLevel(logging.INFO)
We could also use a custom logger? I'll look into this more and propose a way for these messages to be hidden by default but easily enabled. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adressed in e3b6c8f.
Logging can now be configured at the module level & default logging level is set to WARN.
Default log level is now scoped to our module and can be set like this:
import logging
logging.getLogger('lucid').setLevel(logging.INFO)
I included some documentation in the lucid module's __init__
file, but we may want to move this explanation to a readme or FAQ in the future.
(Though it seems to follow Python best practices, I didn't know about the one-logger-per module either.)
lucid/util/load.py
Outdated
from lucid.util.read import reading | ||
|
||
|
||
def load_npy(handle): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems weird to me for these to take a handle. I kind of liked that, in the old version, they could be used to override inferring the type from the extension. But on reflection, I never used it.
If we intend for these to take handles, maybe make them private functions (eg. _load_npy
). The present version seems potentially confusing to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems a bit inconsistent with save_npy(object, url)
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right; these should be private. Savers take handles because more often than not that's what the libraries expect. It's not perfect as this breaks the serialize/write abstraction a little bit.
I believe this can be solved by making those functions private, and potentially adding back load_npy(path)
convenience methods should we need them. I believe our goal should be to never need to explicitly call these, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making private mostly resolves this. Maybe also change their names so that they don't look like duals to save_*()
functions? _save_to_handle_npy()
is a bit clunky, but meh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just want to check that you're aware of cstringio. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 04cc1cc by making loaders private for now. If we ever need them to be public as a fallback I will make sure they follow the same interface as save
and take a path/url directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I am aware of cStringIO. Unfortunately it's a Python2-only thing
(towards the bottom of the list).
lucid/util/load.py
Outdated
return result | ||
else: | ||
message = "Unknown extension '{}', supports {}." | ||
raise RuntimeError(message.format(ext, loaders)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"supported" and loaders.keys()
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was laziness on my part. .keys()
is one of those expressions that's not compatible across python versions, but I will simply use list(…)
. (Source, and generally handy cheat sheet for compatibility)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 04cc1cc.
lucid/util/read.py
Outdated
|
||
|
||
@contextmanager | ||
def reading(path, mode=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that reading()
only supports gfile right now. Does that mean that load()
only supports what gfile
does at present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, but merely WIP. reading
/read
will have feature parity before I suggest this be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 4a91554.
lucid/util/write.py
Outdated
def _supports_make_dirs(path): | ||
"""Whether this path implies a storage system that supports and requires | ||
intermediate directories to be created explicitly.""" | ||
return not path.startswith("/bigstore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, GFS and bigstore are the same thing.
Where is the line between utils and misc? Should gradient_override be in utils? |
I wanted to start separating infrastructure/glue code from, you know, "actual math code". I'm not set on the naming, but I felt that |
(Adds invariance to extension CAPITALIZATION and tries opening unknown extensions as images.)
lucid/util/array_to_image.py
Outdated
""" | ||
rank = len(shape) | ||
|
||
if rank == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PIL.Image.fromarray() implements this logic already (except the case ndim=3, depth=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, thanks for letting me know! The single image with non-squeezed depth = 1 case tripped me up in a notebook some timer ago, but I'll see if squeezing such dimensions allows us to get rid of this method altogether!
lucid/util/show.py
Outdated
raise ValueError("Unsupported mode '%s'", mode) | ||
|
||
|
||
def _data_from_image(image, fmt='png', quality=95): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_encode_image()?
…for show using mocks, caching support
From sync with @colah: Tasks concerning logging and interactive usage:
Additional considerations
|
…g by avoiding it, more structure for tests
Moving the remaining ToDos here to issues. |
# Conflicts: # notebooks/local_development.ipynb
@colah @znah this is a rewrite of unified io.
It doesn't introduce a lot of new functionality, but it does disentangle our prior abstractions a little:
If that looks obvious—good. :D
Does come with tests, but thorough testing is tough as we'll need to try the writing and reading functionality in various environments like
/bigstore
,/cns
, on which I can't run automated tests.I'd love to have a discussion about naming at this point. I went with what felt obvious, but I'm in no way stuck on these names. I also want to discuss what we should expose to users by default. I have put suggested functions into the
__init__.py
file.Let me know your thoughts! :-)