Various improvements #101

tyarkoni · 2017-01-01T21:15:42Z

This PR adds many improvements and fixes. Main changes include:

New logging system based around a TransformationLog class that tracks key parameters of every transformation. This includes selected arguments to each Transformer's __init__, and the attributes to log are now specified in each class's _log_attributes attribute.
Better handling of Stims that can be iterated (e.g., VideoStim, ComplexTextStim, etc.). Before, the conversion from the container to the children happened implicitly (e.g., there was no way to track the fact that a TextStim came from a ComplexTextStim, rather than being created de novo); now, this conversion is handled just like any other conversion, via a new StimCollectionIterator Converter class. These iterators are now stored in a new stimuli/iterators.py module.
New CompoundStim class that serves as a container for an arbitrary set of Stim classes. This is intended to greatly simplify and streamline the way we deal with Transformer classes that need multiple Stim types as input. Every CompoundStim has three properties that define its behavior: _allowed_types, allow_multiple, and _primary. These allow very succinct yet powerful specifications of compound Stim classes. For example, we previously had a TranscribedAudioStim class that was basically just an AudioStim with an attached ComplexTextStim. This is now represented as a subclass of CompoundStim that has _allowed_types=(AudioStim, ComplexTextStim), _allow_multiple = False, and _primary = AudioStim. This means that instance of this class (i) can only contain AudioStim and ComplexTextStim instances, (ii) cannot have more than one instance of each allowed type, and (iii) treat the AudioStim instance as the primary component, so that key information (e.g., filename) will be taken from that component when needed.
Taking advantage of the new CompoundStim approach, the Transformer._input_type attribute can now be a tuple (e.g., _input_type = (AudioStim, ComplexTextStim)), which indicates that the Transformer requires a CompoundStim containing all of the specified types in order to operate.
Improved Transformer.transform() logic. The introduction of the StimCollectionIterator pattern means we no longer have to handle StimCollectionMixin-supporting Stims as a special case. This simplifies the logic to only two main cases, and allows us to more easily log all transformations. It also integrates all of the changes above (e.g., a Transformer will fail if it requires multiple input types that are not all passed).
The transformation logic should now support generators everywhere--though this should probably be tested more extensively (all current tests pass fine). In principle, this should very substantially reduce a Graph's memory footprint, as there's no need to, e.g., hold all of a movie's VideoFrameStims in memory. In practice, I haven't done any serious benchmarking, and it's possible that there are some overlooked references to old objects that might prevent garbage collection, so we should investigate this more thoroughly at some point. (Note: allowing generators to propagate through a Graph introduced some extra complexities that I finessed for the moment; e.g., generators can't be pickled, so caching via joblib breaks).
Improved naming conventions. Rather than just appending names following every convention, some Stims now use names that more clearly indicat vge what's going on. E.g., VideoFrameStims now have the convention 'movie.mp4->frame[10]', instead of 'movie.mp4_0'. TextStims now have names like 'text[illuminating]' rather than just 'illuminating'.
merge_results now injects additional columns containing conversion history and class type into the results.
Caching is now off by default, but can be turned on separately for each kind of Transformer (i.e., Converter, Filter, or Extractor) via the config module.

…that

…lution

…th explicit Iterator Converters; fix caching issues; other assorted improvements

tyarkoni added 25 commits December 24, 2016 22:50

improved (or at least, altered) naming scheme

20b0f9c

fix uniqueness constraint that prevented merge_results from working

892ab87

replace deprecated pd.sort() calls

bd0eb35

add source_stim attribute to all converted stims

514cbbe

store and access conversion history in Stim

c098f9f

set static history attribute upon conversion

09c2156

SpeechRecognition now supports Google Cloud Speech API, so let's use …

f90e6cf

…that

remove .history from DerivedVideoStim until we find a more general so…

fd79e69

…lution

move CompoundStim into its own module

f93ccf8

make TranscribedAudioStim a CompoundStim subclass

14c2904

fix CompoundStim transformer logic

35c5e76

cache Filter results and ensure type matching

a00c352

move merge flag from Graph init to extract() call

c6995d8

move transformer caching into utils

f37a0e6

simplified properties

77f85bb

reworked transformation logging

43c0709

add config module

5e31254

update transformers with _log_attributes

015ccf9

minor changes to stim naming conventions

a051a7c

add and fix tests to reflect recent changes

cf20da6

add get_value() method to TransformationHistory

60e6e25

make sure convert() dispatches to _transform()

a6bd660

reworked TransformationLog, moved stuff around, general improvements

271cd53

simplified transformer logic--replace CollectionStimMixin handling wi…

7de408b

…th explicit Iterator Converters; fix caching issues; other assorted improvements

fix and add tests to reflect changes

a12cd74

tyarkoni merged commit 4592589 into master Jan 1, 2017

tyarkoni mentioned this pull request Jan 1, 2017

Inject Converter records into Graph #92

Closed

tyarkoni deleted the various-improvements branch January 3, 2017 23:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various improvements #101

Various improvements #101

tyarkoni commented Jan 1, 2017

Various improvements #101

Various improvements #101

Conversation

tyarkoni commented Jan 1, 2017