Various improvements #101

merged 25 commits into from Jan 1, 2017


None yet

1 participant

tyarkoni commented Jan 1, 2017

This PR adds many improvements and fixes. Main changes include:

  • New logging system based around a TransformationLog class that tracks key parameters of every transformation. This includes selected arguments to each Transformer's __init__, and the attributes to log are now specified in each class's _log_attributes attribute.

  • Better handling of Stims that can be iterated (e.g., VideoStim, ComplexTextStim, etc.). Before, the conversion from the container to the children happened implicitly (e.g., there was no way to track the fact that a TextStim came from a ComplexTextStim, rather than being created de novo); now, this conversion is handled just like any other conversion, via a new StimCollectionIterator Converter class. These iterators are now stored in a new stimuli/ module.

  • New CompoundStim class that serves as a container for an arbitrary set of Stim classes. This is intended to greatly simplify and streamline the way we deal with Transformer classes that need multiple Stim types as input. Every CompoundStim has three properties that define its behavior: _allowed_types, allow_multiple, and _primary. These allow very succinct yet powerful specifications of compound Stim classes. For example, we previously had a TranscribedAudioStim class that was basically just an AudioStim with an attached ComplexTextStim. This is now represented as a subclass of CompoundStim that has _allowed_types=(AudioStim, ComplexTextStim), _allow_multiple = False, and _primary = AudioStim. This means that instance of this class (i) can only contain AudioStim and ComplexTextStim instances, (ii) cannot have more than one instance of each allowed type, and (iii) treat the AudioStim instance as the primary component, so that key information (e.g., filename) will be taken from that component when needed.

  • Taking advantage of the new CompoundStim approach, the Transformer._input_type attribute can now be a tuple (e.g., _input_type = (AudioStim, ComplexTextStim)), which indicates that the Transformer requires a CompoundStim containing all of the specified types in order to operate.

  • Improved Transformer.transform() logic. The introduction of the StimCollectionIterator pattern means we no longer have to handle StimCollectionMixin-supporting Stims as a special case. This simplifies the logic to only two main cases, and allows us to more easily log all transformations. It also integrates all of the changes above (e.g., a Transformer will fail if it requires multiple input types that are not all passed).

  • The transformation logic should now support generators everywhere--though this should probably be tested more extensively (all current tests pass fine). In principle, this should very substantially reduce a Graph's memory footprint, as there's no need to, e.g., hold all of a movie's VideoFrameStims in memory. In practice, I haven't done any serious benchmarking, and it's possible that there are some overlooked references to old objects that might prevent garbage collection, so we should investigate this more thoroughly at some point. (Note: allowing generators to propagate through a Graph introduced some extra complexities that I finessed for the moment; e.g., generators can't be pickled, so caching via joblib breaks).

  • Improved naming conventions. Rather than just appending names following every convention, some Stims now use names that more clearly indicat vge what's going on. E.g., VideoFrameStims now have the convention 'movie.mp4->frame[10]', instead of 'movie.mp4_0'. TextStims now have names like 'text[illuminating]' rather than just 'illuminating'.

  • merge_results now injects additional columns containing conversion history and class type into the results.

  • Caching is now off by default, but can be turned on separately for each kind of Transformer (i.e., Converter, Filter, or Extractor) via the config module.

added some commits Dec 25, 2016
@tyarkoni improved (or at least, altered) naming scheme 20b0f9c
@tyarkoni fix uniqueness constraint that prevented merge_results from working 892ab87
@tyarkoni replace deprecated pd.sort() calls bd0eb35
@tyarkoni add source_stim attribute to all converted stims 514cbbe
@tyarkoni store and access conversion history in Stim c098f9f
@tyarkoni set static history attribute upon conversion 09c2156
@tyarkoni SpeechRecognition now supports Google Cloud Speech API, so let's use …
@tyarkoni remove .history from DerivedVideoStim until we find a more general so…
@tyarkoni move CompoundStim into its own module f93ccf8
@tyarkoni make TranscribedAudioStim a CompoundStim subclass 14c2904
@tyarkoni fix CompoundStim transformer logic 35c5e76
@tyarkoni cache Filter results and ensure type matching a00c352
@tyarkoni move merge flag from Graph init to extract() call c6995d8
@tyarkoni move transformer caching into utils f37a0e6
@tyarkoni simplified properties 77f85bb
@tyarkoni reworked transformation logging 43c0709
@tyarkoni add config module 5e31254
@tyarkoni update transformers with _log_attributes 015ccf9
@tyarkoni minor changes to stim naming conventions a051a7c
@tyarkoni add and fix tests to reflect recent changes cf20da6
@tyarkoni add get_value() method to TransformationHistory 60e6e25
@tyarkoni make sure convert() dispatches to _transform() a6bd660
@tyarkoni reworked TransformationLog, moved stuff around, general improvements 271cd53
@tyarkoni simplified transformer logic--replace CollectionStimMixin handling wi…
…th explicit Iterator Converters; fix caching issues; other assorted improvements
@tyarkoni fix and add tests to reflect changes
@tyarkoni tyarkoni merged commit 4592589 into master Jan 1, 2017

2 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
continuous-integration/travis-ci/push The Travis CI build passed
coverage/coveralls First build on various-improvements at 78.866%
@tyarkoni tyarkoni deleted the various-improvements branch Jan 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment