BatchFeature should cast to np.float32
by default
#12862
Labels
WIP
Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Currently the default dtype for Speech Feature Extractors is
numpy.float64
which leads to two problems:.wav
) and are then transformed to float64 which unnecessarly increases RAM by a factor of 4. We should at least stick tofloat32
transformers/src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py
Line 87 in f6e2544
The main problem is that
np.asarray([....])
by default creates a np.float64 array and that we just pass that format along.=> We should either always cast to float32 in BatchFeature (see here:
transformers/src/transformers/feature_extraction_utils.py
Line 151 in f6e2544
dtype
to BatchFeature.@patrickvonplaten
The text was updated successfully, but these errors were encountered: