Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various improvements #101

Merged
merged 25 commits into from Jan 1, 2017
Merged

Various improvements #101

merged 25 commits into from Jan 1, 2017

Conversation

tyarkoni
Copy link
Collaborator

@tyarkoni tyarkoni commented Jan 1, 2017

This PR adds many improvements and fixes. Main changes include:

  • New logging system based around a TransformationLog class that tracks key parameters of every transformation. This includes selected arguments to each Transformer's __init__, and the attributes to log are now specified in each class's _log_attributes attribute.

  • Better handling of Stims that can be iterated (e.g., VideoStim, ComplexTextStim, etc.). Before, the conversion from the container to the children happened implicitly (e.g., there was no way to track the fact that a TextStim came from a ComplexTextStim, rather than being created de novo); now, this conversion is handled just like any other conversion, via a new StimCollectionIterator Converter class. These iterators are now stored in a new stimuli/iterators.py module.

  • New CompoundStim class that serves as a container for an arbitrary set of Stim classes. This is intended to greatly simplify and streamline the way we deal with Transformer classes that need multiple Stim types as input. Every CompoundStim has three properties that define its behavior: _allowed_types, allow_multiple, and _primary. These allow very succinct yet powerful specifications of compound Stim classes. For example, we previously had a TranscribedAudioStim class that was basically just an AudioStim with an attached ComplexTextStim. This is now represented as a subclass of CompoundStim that has _allowed_types=(AudioStim, ComplexTextStim), _allow_multiple = False, and _primary = AudioStim. This means that instance of this class (i) can only contain AudioStim and ComplexTextStim instances, (ii) cannot have more than one instance of each allowed type, and (iii) treat the AudioStim instance as the primary component, so that key information (e.g., filename) will be taken from that component when needed.

  • Taking advantage of the new CompoundStim approach, the Transformer._input_type attribute can now be a tuple (e.g., _input_type = (AudioStim, ComplexTextStim)), which indicates that the Transformer requires a CompoundStim containing all of the specified types in order to operate.

  • Improved Transformer.transform() logic. The introduction of the StimCollectionIterator pattern means we no longer have to handle StimCollectionMixin-supporting Stims as a special case. This simplifies the logic to only two main cases, and allows us to more easily log all transformations. It also integrates all of the changes above (e.g., a Transformer will fail if it requires multiple input types that are not all passed).

  • The transformation logic should now support generators everywhere--though this should probably be tested more extensively (all current tests pass fine). In principle, this should very substantially reduce a Graph's memory footprint, as there's no need to, e.g., hold all of a movie's VideoFrameStims in memory. In practice, I haven't done any serious benchmarking, and it's possible that there are some overlooked references to old objects that might prevent garbage collection, so we should investigate this more thoroughly at some point. (Note: allowing generators to propagate through a Graph introduced some extra complexities that I finessed for the moment; e.g., generators can't be pickled, so caching via joblib breaks).

  • Improved naming conventions. Rather than just appending names following every convention, some Stims now use names that more clearly indicat vge what's going on. E.g., VideoFrameStims now have the convention 'movie.mp4->frame[10]', instead of 'movie.mp4_0'. TextStims now have names like 'text[illuminating]' rather than just 'illuminating'.

  • merge_results now injects additional columns containing conversion history and class type into the results.

  • Caching is now off by default, but can be turned on separately for each kind of Transformer (i.e., Converter, Filter, or Extractor) via the config module.

…th explicit Iterator Converters; fix caching issues; other assorted improvements
@tyarkoni tyarkoni merged commit 4592589 into master Jan 1, 2017
@tyarkoni tyarkoni deleted the various-improvements branch January 3, 2017 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant