- Added a combine_analyzer() that supports user provided combiner, conforming to beam.CombinFn(). This allows users to implement custom combiners (e.g. median), to complement analyzers (like min, max) that are prepackaged in TFT.
- Quantiles Analyzer (
tft.quantiles
), with a correspondingtft.bucketize
mapper.
- Depends on
apache-beam[gcp]>=2.2,<3
. - Fixes some KeyError issues that appeared in certain circumstances when one would call AnalyzeAndTransformDataset (due to a now-fixed Apache Beam [bug] (https://issues.apache.org/jira/projects/BEAM/issues/BEAM-2966)).
- Allow all functions that accept and return tensors, to accept an optional name scope, in line with TensorFlow coding conventions.
- Update examples to construct input functions by hand instead of using helper functions.
- Change scale_by_min_max/scale_to_0_1 to return the average(min, max) of the range in case all values are identical.
- Added export of serving model to examples.
- Use "core" version of feature columns (tf.feature_column instead of tf.contrib) in examples.
- A few bug fixes and improvements for coders regarding Python 3.
- Requires pre-installed TensorFlow >= 1.4.
- No longer distributing a WHL file in PyPI. Only doing a source distribution
which should however be compatible with all platforms (ie you are still able
to
pip install tensorflow-transform
and userequirements.txt
orsetup.py
files for environment setup). - Some functions now introduce a new name scope when they did not before so the names of tensors may change. This will only affect you if you directly lookup tensors by name in the graph produced by tf.Transform.
- Various Analyzer Specs (_NumericCombineSpec, _UniquesSpec, _QuantilesSpec) are now private. Analyzers are accessible only via the top-level TFT functions (min, max, sum, size, mean, var, uniques, quantiles).
- The
serving_input_fn
s ontensorflow_transform/saved/input_fn_maker.py
will be removed on a future version and should not be used on new code, see theexamples
directory for details on how to migrate your code to define their own serving functions.
- We now provide helper methods for creating
serving_input_receiver_fn
for use with tf.estimator. These mirror the existing functions targeting the legacy tf.contrib.learn.estimators-- i.e. for each*_serving_input_fn()
in input_fn_maker there is now also a*_serving_input_receiver_fn()
.
- Introduced
tft.apply_vocab
this allows users to separately apply a single vocabulary (as generated bytft.uniques
) to several different columns. - Provide a source distribution tar
tensorflow-transform-X.Y.Z.tar.gz
.
- The default prefix for
tft.string_to_int
vocab_filename
changed fromvocab_string_to_int
tovocab_string_to_int_uniques
. To make your pipelines resilient to implementation details please setvocab_filename
if you are using the generated vocab_filename on a downstream component.
- Added hash_strings mapper.
- Write vocabularies as asset files instead of constants in the SavedModel.
- 'tft.tfidf' now adds 1 to idf values so that terms in every document in the corpus have a non-zero tfidf value.
- Performance and memory usage improvement when running with Beam runners that use multi-threaded workers.
- Performance optimizations in ExampleProtoCoder.
- Depends on
apache-beam[gcp]>=2.1.1,<3
. - Depends on
protobuf>=3.3<4
. - Depends on
six>=1.9,<1.11
.
- Requires pre-installed TensorFlow >= 1.3.
- Removed
tft.map
usetft.apply_function
instead (as needed). - Removed
tft.tfidf_weights
usetft.tfidf
instead. beam_metadata_io.WriteMetadata
now requires a secondpipeline
argument (see examples).- A Beam bug will now affect users who call AnalyzeAndTransformDataset in
certain circumstances. Roughly speaking, if you call
beam.Pipeline()
at some point (as all our examples do) you will not experience this bug. The bug is characterized by an error similar toKeyError: (u'AnalyzeAndTransformDataset/AnalyzeDataset/ComputeTensorValues/Extract[Maximum:0]', None)
This bug will be fixed in Beam 2.2.
- Add json-example serving input functions to TF.Transform.
- Add variance analyzer to tf.transform.
- Remove duplication in output of
tft.tfidf
. - Ensure ngrams output dense_shape is greater than or equal to 0.
- Alters the behavior and interface of tensorflow_transform.mappers.ngrams.
- Depends on
apache-beam[gcp]=>2,<3
. - Making TF Parallelism runner-dependent.
- Fixes issue with csv serving input function.
- Various performance and stability improvements.
tft.map
will be removed on version 0.2.0, see theexamples
directory for instructions on how to usetft.apply_function
instead (as needed).tft.tfidf_weights
will be removed on version 0.2.0, usetft.tfidf
instead.
- Refactor internals to remove Column and Statistic classes
- Remove collections from graph to avoid warnings
- Return float32 from
tfidf_weights
- Update tensorflow_transform to use
tf.saved_model
APIs. - Add default values on example proto coder.
- Various performance and stability improvements.