-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter component commit #158
Conversation
Thanks for the PR! 🚀 Instructions: Approve using |
statistics: types.BaseChannel, | ||
schema: types.BaseChannel, | ||
exclude_splits: Optional[List[str]] = None, | ||
filter_function : () = lambda x :x ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a bit of trouble understanding how this filter_function is used in executor, seems it's not passed into the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a commit with the fix
|
||
See [the ExampleValidator | ||
guide](https://www.tensorflow.org/tfx/guide/exampleval) for more details. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a description for ExampleValidator
component whats the usage and description for ExampleFilter?
_TELEMETRY_DESCRIPTORS = ['Evaluator'] | ||
|
||
|
||
class Executor(base_beam_executor.BaseBeamExecutor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a hard time figuring out how is filtering happenning? Seems there's a lot of code here from model evaluator, can we remove extraneous code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a commit with the fix
""" | ||
|
||
SPEC_CLASS = standard_component_specs.Filter_FunctionSpec | ||
EXECUTOR_SPEC = executor_spec.BeamExecutorSpec(executor.Executor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are using ExampleValidator Executor
guide](https://www.tensorflow.org/tfx/guide/exampleval) for more details. | ||
""" | ||
|
||
SPEC_CLASS = standard_component_specs.Filter_FunctionSpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spec class is incorrect as it doesnt exist in TFX
class ExampleFilter(base_beam_component.BaseBeamComponent): | ||
"""A TFX component to validate input examples. | ||
|
||
The ExampleValidator component uses [Tensorflow Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation is for ExampleValidator not ExampleFilter
baseline_model: Optional[types.BaseChannel] = None, | ||
feature_slicing_spec: Optional[Union[evaluator_pb2.FeatureSlicingSpec, | ||
data_types.RuntimeParameter]] = None, | ||
fairness_indicator_thresholds: Optional[Union[ | ||
List[float], data_types.RuntimeParameter]] = None, | ||
example_splits: Optional[List[str]] = None, | ||
eval_config: Optional[tfma.EvalConfig] = None, | ||
schema: Optional[types.BaseChannel] = None, | ||
module_file: Optional[str] = None, | ||
exclude_splits: Optional[List[str]] = None, | ||
|
||
module_path: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a baseline_model argument? Same for fairness_indicator, etc.
tensor_adapter_config=tensor_adapter_config) | ||
|
||
try: | ||
examples_list = filter(input_dict[standard_component_specs.FILTER_FUNCTIONS_KEY][0], examples_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work. examples_list contains a few Beam pipelines not examples itself.
blessing.set_string_custom_property( | ||
'component_id', exec_properties['current_component_id']) | ||
# Check validation result and write BLESSED file accordingly. | ||
logging.info('Checking validation results.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is exampleFilter doing model validation? Let's remove anything that is not related to the example filter.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a new ipynb?
|
||
|
||
# Output artifact containing required data related to feature selection | ||
class FeatureSelectionArtifact(artifact.Artifact): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats this for?
print('raw_dataset') | ||
print(raw_dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove print
] | ||
return file_list | ||
# reads and returns data from TFRecords at URI as a list of dictionaries with values as numpy arrays | ||
def _get_data_from_tfrecords(train_uri): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing type annotations
return [feature_keys, target, input_data] | ||
|
||
# reads and returns data from TFRecords at URI as a list of dictionaries with values as numpy arrays | ||
def _get_data_from_tfrecords(train_uri): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is defined twice?
filter_function_str : Parameter[str], | ||
filtered_data: OutputArtifact[standard_artifacts.Examples], | ||
) -> OutputDict(list_len=int): | ||
'''My simple trainer component.''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add documentation on how to use the component
import tensorflow as tf | ||
from absl.testing import absltest | ||
from tfx.types import artifact_utils, channel_utils, standard_artifacts | ||
import component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use relative imports or import from tfx_addons import path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not fixed ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not fixed
@@ -0,0 +1,4 @@ | |||
def filter_function(x_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this go under a folder?
Also generally, why not use a Beam pipeline to do so vs this numpy filter? |
|
||
|
||
@component | ||
def MyTrainerComponent( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the component name updated?
|
||
|
||
# returns data in list and nested list formats compatible with sklearn | ||
def _data_preprocessing(np_dataset, target_feature): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used?
# get a list of files for the specified path | ||
def _get_file_list(dir_path): | ||
file_list = [ | ||
f for f in os.listdir(dir_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work when dir_path is a GCS path?
|
||
|
||
# update example with selected features | ||
def _update_example(selected_features, orig_example): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used?
records = _get_data_from_tfrecords(input_data.uri + "/Split-train") | ||
filter_function = importlib.import_module(filter_function_str).filter_function | ||
records = filter_function(records) | ||
filtered_data = records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filtered_data is an artifact, you need to save the data out as tf.example into filtered_data.uri. Check https://www.tensorflow.org/tfx/tutorials/tfx/python_function_component for how to do so.
tfx_addons/example_filter/labels.py
Outdated
EXTERNAL_CONFIG_VERSION = 'external_config_version' | ||
|
||
# Output labels | ||
SCHEMA_DIFF_PATH = 'schema_diff_path' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used?
tfx_addons/example_filter/spec.py
Outdated
SAMPLER_SAMPLE_KEY = 'sampling_strategy' | ||
|
||
|
||
class SamplingStrategy(enum.IntEnum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used?
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"""Generic TFX model evaluator executor.""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used?
|
||
} | ||
|
||
component.ExampleFilter(**params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a teest to ensure that logic gets executed?
import tensorflow as tf | ||
from absl.testing import absltest | ||
from tfx.types import artifact_utils, channel_utils, standard_artifacts | ||
import component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not fixed ^
I don't understand. Iris_example_colab.ipynb is the example notebook from the feature selection component. Why is it included here? |
it will be removed. |
import tensorflow as tf | ||
from absl.testing import absltest | ||
from tfx.types import artifact_utils, channel_utils, standard_artifacts | ||
import component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not fixed
tfx_addons/example_filter/gcp.py
Outdated
@@ -0,0 +1,250 @@ | |||
# Copyright 2021 Google LLC. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would suggest instead to write one without GCP which just runs locally. There's a lot of boilderplate here which is not needed
made a new push @casassg |
@vulkomilev Here are the steps to finish the component PR:
Thank you! |
Fixes #<issue_number_goes_here>