Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter component commit #158

Closed
wants to merge 13 commits into from
Closed

filter component commit #158

wants to merge 13 commits into from

Conversation

vulkomilev
Copy link
Contributor

Fixes #<issue_number_goes_here>

It's a good idea to open an issue first for discussion.

  • Tests pass
  • Appropriate changes to README are included in PR

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2022

Thanks for the PR! 🚀

Instructions: Approve using /lgtm and mark for automatic merge by using /merge.

statistics: types.BaseChannel,
schema: types.BaseChannel,
exclude_splits: Optional[List[str]] = None,
filter_function : () = lambda x :x ):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a bit of trouble understanding how this filter_function is used in executor, seems it's not passed into the spec?

Copy link
Contributor Author

@vulkomilev vulkomilev Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a commit with the fix


See [the ExampleValidator
guide](https://www.tensorflow.org/tfx/guide/exampleval) for more details.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a description for ExampleValidator component whats the usage and description for ExampleFilter?

_TELEMETRY_DESCRIPTORS = ['Evaluator']


class Executor(base_beam_executor.BaseBeamExecutor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a hard time figuring out how is filtering happenning? Seems there's a lot of code here from model evaluator, can we remove extraneous code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a commit with the fix

"""

SPEC_CLASS = standard_component_specs.Filter_FunctionSpec
EXECUTOR_SPEC = executor_spec.BeamExecutorSpec(executor.Executor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are using ExampleValidator Executor

guide](https://www.tensorflow.org/tfx/guide/exampleval) for more details.
"""

SPEC_CLASS = standard_component_specs.Filter_FunctionSpec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec class is incorrect as it doesnt exist in TFX

class ExampleFilter(base_beam_component.BaseBeamComponent):
"""A TFX component to validate input examples.

The ExampleValidator component uses [Tensorflow Data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is for ExampleValidator not ExampleFilter

Comment on lines 70 to 81
baseline_model: Optional[types.BaseChannel] = None,
feature_slicing_spec: Optional[Union[evaluator_pb2.FeatureSlicingSpec,
data_types.RuntimeParameter]] = None,
fairness_indicator_thresholds: Optional[Union[
List[float], data_types.RuntimeParameter]] = None,
example_splits: Optional[List[str]] = None,
eval_config: Optional[tfma.EvalConfig] = None,
schema: Optional[types.BaseChannel] = None,
module_file: Optional[str] = None,
exclude_splits: Optional[List[str]] = None,

module_path: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a baseline_model argument? Same for fairness_indicator, etc.

tensor_adapter_config=tensor_adapter_config)

try:
examples_list = filter(input_dict[standard_component_specs.FILTER_FUNCTIONS_KEY][0], examples_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work. examples_list contains a few Beam pipelines not examples itself.

blessing.set_string_custom_property(
'component_id', exec_properties['current_component_id'])
# Check validation result and write BLESSED file accordingly.
logging.info('Checking validation results.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is exampleFilter doing model validation? Let's remove anything that is not related to the example filter.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@casassg casassg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a new ipynb?



# Output artifact containing required data related to feature selection
class FeatureSelectionArtifact(artifact.Artifact):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats this for?

Comment on lines 41 to 42
print('raw_dataset')
print(raw_dataset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove print

]
return file_list
# reads and returns data from TFRecords at URI as a list of dictionaries with values as numpy arrays
def _get_data_from_tfrecords(train_uri):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing type annotations

return [feature_keys, target, input_data]

# reads and returns data from TFRecords at URI as a list of dictionaries with values as numpy arrays
def _get_data_from_tfrecords(train_uri):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is defined twice?

filter_function_str : Parameter[str],
filtered_data: OutputArtifact[standard_artifacts.Examples],
) -> OutputDict(list_len=int):
'''My simple trainer component.'''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add documentation on how to use the component

import tensorflow as tf
from absl.testing import absltest
from tfx.types import artifact_utils, channel_utils, standard_artifacts
import component
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use relative imports or import from tfx_addons import path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not fixed ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not fixed

@@ -0,0 +1,4 @@
def filter_function(x_list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go under a folder?

@casassg
Copy link
Member

casassg commented Oct 12, 2022

Also generally, why not use a Beam pipeline to do so vs this numpy filter?



@component
def MyTrainerComponent(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the component name updated?



# returns data in list and nested list formats compatible with sklearn
def _data_preprocessing(np_dataset, target_feature):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?

# get a list of files for the specified path
def _get_file_list(dir_path):
file_list = [
f for f in os.listdir(dir_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work when dir_path is a GCS path?



# update example with selected features
def _update_example(selected_features, orig_example):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?

records = _get_data_from_tfrecords(input_data.uri + "/Split-train")
filter_function = importlib.import_module(filter_function_str).filter_function
records = filter_function(records)
filtered_data = records
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filtered_data is an artifact, you need to save the data out as tf.example into filtered_data.uri. Check https://www.tensorflow.org/tfx/tutorials/tfx/python_function_component for how to do so.

EXTERNAL_CONFIG_VERSION = 'external_config_version'

# Output labels
SCHEMA_DIFF_PATH = 'schema_diff_path'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?

SAMPLER_SAMPLE_KEY = 'sampling_strategy'


class SamplingStrategy(enum.IntEnum):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?

# See the License for the specific language governing permissions and
# limitations under the License.
"""Generic TFX model evaluator executor."""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?


}

component.ExampleFilter(**params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a teest to ensure that logic gets executed?

import tensorflow as tf
from absl.testing import absltest
from tfx.types import artifact_utils, channel_utils, standard_artifacts
import component
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not fixed ^

@rcrowe-google
Copy link
Collaborator

I don't understand. Iris_example_colab.ipynb is the example notebook from the feature selection component. Why is it included here?

@vulkomilev
Copy link
Contributor Author

I don't understand. Iris_example_colab.ipynb is the example notebook from the feature selection component. Why is it included here?

it will be removed.

tfx_addons/example_filter/Iris_example_colab.ipynb Outdated Show resolved Hide resolved
tfx_addons/example_filter/component.py Outdated Show resolved Hide resolved
tfx_addons/example_filter/component.py Outdated Show resolved Hide resolved
tfx_addons/example_filter/component.py Outdated Show resolved Hide resolved
tfx_addons/example_filter/component.py Outdated Show resolved Hide resolved
import tensorflow as tf
from absl.testing import absltest
from tfx.types import artifact_utils, channel_utils, standard_artifacts
import component
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not fixed

@@ -0,0 +1,250 @@
# Copyright 2021 Google LLC. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest instead to write one without GCP which just runs locally. There's a lot of boilderplate here which is not needed

@vulkomilev
Copy link
Contributor Author

made a new push @casassg

@hanneshapke
Copy link
Contributor

@vulkomilev Here are the steps to finish the component PR:

  • Update copy right in all files of your component
  • Update tests, make sure they run and remove anything Pandas Transform related
  • run the linter and make sure that the linter is green on Github!
  • update the example notebook with an example csv file, load the csv data, and apply the filter component, and print the content of the TFRecords (the first 4 samples)
  • resolve all conversations above which you have addressed, leave the ones open for which you have questions. That helps to quickly review the PR.

Thank you!

@vulkomilev vulkomilev closed this by deleting the head repository Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants