Skip to content

generate_statistics_from_pyarrow table or parquet #92

@tanguycdls

Description

@tanguycdls

/type feature
Hi, since TF records are already converted to Pyarrow Tables to compute statistics, how hard would it be to add an option to read directly Pyarrow file or Parquet file?

| 'DecodeData' >> tf_example_decoder.DecodeTFExample(

If my understanding of that code is correct we could replace beam.io.textio.ReadFromText by beam.io.parquetio.ReadFromParquet? if so will we need to extract features or the Pyarrow schema would be enough ?

My aim would be to use TFDV to extract data features and visualise them using facets.

Thanks

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions