-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding "connector" components #58
Comments
tf example fields need to be either integer, float or string |
Thank you for tip! What are your thoughts on adding connector components? There are use cases where writing a custom component for transformations would be easier than a SQL query. eg. windowing data and applying some sort of transformations. what I had envisioned would be using a tfx component to fetch data and a custom component to do the required transformations. Similar to Apache Flinks supported connectors? |
We are still trying to figure out our custom component story, but for now for your need, you can consider writing a custom preprocessor component following the examples under components/, then add that component before example gen in your xxx_pipeline.py file |
Hi, we now added an example for custom component also for file based example gen, you just need to provide a beam PTransform for converting input file source to tf.example, here is the instruction |
This PR updates relese binary build command for sig-io Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
It would be beneficial to have core connector components which connect to sources such as BigQuery, CSV, Kafka, S3 ...etc without doing any additional logic and returning either a dict or PCollection.
Reasoning for this is that data sources are not perfect and some transformations may be required prior to
example_gen
. eg,BigQueryExampleGen
fails if the table contains aTIMESTAMP
field.Does this go along with what TFX is attempting to solve or should we assume our data is clean enough for a
tf example
?The text was updated successfully, but these errors were encountered: