New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various improvements #101
Merged
Merged
Various improvements #101
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…th explicit Iterator Converters; fix caching issues; other assorted improvements
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds many improvements and fixes. Main changes include:
New logging system based around a
TransformationLog
class that tracks key parameters of every transformation. This includes selected arguments to eachTransformer
's__init__
, and the attributes to log are now specified in each class's_log_attributes
attribute.Better handling of
Stim
s that can be iterated (e.g.,VideoStim
,ComplexTextStim
, etc.). Before, the conversion from the container to the children happened implicitly (e.g., there was no way to track the fact that aTextStim
came from aComplexTextStim
, rather than being created de novo); now, this conversion is handled just like any other conversion, via a newStimCollectionIterator
Converter
class. These iterators are now stored in a newstimuli/iterators.py
module.New
CompoundStim
class that serves as a container for an arbitrary set ofStim
classes. This is intended to greatly simplify and streamline the way we deal withTransformer
classes that need multipleStim
types as input. EveryCompoundStim
has three properties that define its behavior:_allowed_types
,allow_multiple
, and_primary
. These allow very succinct yet powerful specifications of compoundStim
classes. For example, we previously had aTranscribedAudioStim
class that was basically just anAudioStim
with an attachedComplexTextStim
. This is now represented as a subclass ofCompoundStim
that has_allowed_types=(AudioStim, ComplexTextStim)
,_allow_multiple = False
, and_primary = AudioStim
. This means that instance of this class (i) can only containAudioStim
andComplexTextStim
instances, (ii) cannot have more than one instance of each allowed type, and (iii) treat theAudioStim
instance as the primary component, so that key information (e.g., filename) will be taken from that component when needed.Taking advantage of the new
CompoundStim
approach, theTransformer._input_type
attribute can now be a tuple (e.g.,_input_type = (AudioStim, ComplexTextStim)
), which indicates that theTransformer
requires aCompoundStim
containing all of the specified types in order to operate.Improved
Transformer.transform()
logic. The introduction of theStimCollectionIterator
pattern means we no longer have to handleStimCollectionMixin
-supportingStim
s as a special case. This simplifies the logic to only two main cases, and allows us to more easily log all transformations. It also integrates all of the changes above (e.g., aTransformer
will fail if it requires multiple input types that are not all passed).The transformation logic should now support generators everywhere--though this should probably be tested more extensively (all current tests pass fine). In principle, this should very substantially reduce a
Graph
's memory footprint, as there's no need to, e.g., hold all of a movie'sVideoFrameStim
s in memory. In practice, I haven't done any serious benchmarking, and it's possible that there are some overlooked references to old objects that might prevent garbage collection, so we should investigate this more thoroughly at some point. (Note: allowing generators to propagate through aGraph
introduced some extra complexities that I finessed for the moment; e.g., generators can't be pickled, so caching via joblib breaks).Improved naming conventions. Rather than just appending names following every convention, some
Stim
s now use names that more clearly indicat vge what's going on. E.g.,VideoFrameStim
s now have the convention'movie.mp4->frame[10]'
, instead of'movie.mp4_0'
.TextStim
s now have names like'text[illuminating]'
rather than just'illuminating'
.merge_results
now injects additional columns containing conversion history and class type into the results.Caching is now off by default, but can be turned on separately for each kind of
Transformer
(i.e.,Converter
,Filter
, orExtractor
) via theconfig
module.