Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding "connector" components #58

Closed
DimitrijeManic opened this issue Apr 25, 2019 · 4 comments
Closed

Consider adding "connector" components #58

DimitrijeManic opened this issue Apr 25, 2019 · 4 comments

Comments

@DimitrijeManic
Copy link

It would be beneficial to have core connector components which connect to sources such as BigQuery, CSV, Kafka, S3 ...etc without doing any additional logic and returning either a dict or PCollection.

Reasoning for this is that data sources are not perfect and some transformations may be required prior to example_gen. eg, BigQueryExampleGen fails if the table contains a TIMESTAMP field.

Does this go along with what TFX is attempting to solve or should we assume our data is clean enough for a tf example?

@1025KB
Copy link
Collaborator

1025KB commented Apr 25, 2019

tf example fields need to be either integer, float or string
For timestamp in BigQuery, just do "UNIX_SECONDS(trip_start_timestamp) AS trip_start_timestamp," in the query, then return result will be integer

@DimitrijeManic
Copy link
Author

Thank you for tip! What are your thoughts on adding connector components?

There are use cases where writing a custom component for transformations would be easier than a SQL query. eg. windowing data and applying some sort of transformations.

what I had envisioned would be using a tfx component to fetch data and a custom component to do the required transformations. Similar to Apache Flinks supported connectors?

@1025KB
Copy link
Collaborator

1025KB commented Apr 25, 2019

We are still trying to figure out our custom component story, but for now for your need, you can consider writing a custom preprocessor component following the examples under components/, then add that component before example gen in your xxx_pipeline.py file

@1025KB
Copy link
Collaborator

1025KB commented May 31, 2019

Hi, we now added an example for custom component

also for file based example gen, you just need to provide a beam PTransform for converting input file source to tf.example, here is the instruction

@1025KB 1025KB closed this as completed May 31, 2019
ruoyu90 pushed a commit to ruoyu90/tfx that referenced this issue Aug 28, 2019
This PR updates relese binary build command for sig-io

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants