You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Marked recorded are stored as individual Parquet files. Parquet is an "immutable" binary format and difficult to edit and view with out special tools.
I recently had to remove some records marked as unsure when I learned that they should have been matched. With about 250 marks, it was quite a pain to go through and find the "offending files"
Describe the solution you'd like
The labels are small data and do not need the columnar binary format. Storing all the records in a single plaintext file such as NDJSON is self describing, appendable, universal, and accessible. This probably applies to other files zingg is persisting too.
Describe alternatives you've considered
CSV is problematic due to the minimal spec without types nor lists. Another alt could be a db for zingg training data, labels, stop words, synonyms, models, and future api for clis and webapps.... but I think the json file would deliver immediate value with a lot less effort.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Marked recorded are stored as individual Parquet files. Parquet is an "immutable" binary format and difficult to edit and view with out special tools.
Exporting and importing labels has lots of extra motion: https://docs.zingg.ai/zingg/stepbystep/createtrainingdata/exportlabeleddata
I recently had to remove some records marked as unsure when I learned that they should have been matched. With about 250 marks, it was quite a pain to go through and find the "offending files"
Describe the solution you'd like
The labels are small data and do not need the columnar binary format. Storing all the records in a single plaintext file such as NDJSON is self describing, appendable, universal, and accessible. This probably applies to other files zingg is persisting too.
Describe alternatives you've considered
CSV is problematic due to the minimal spec without types nor lists. Another alt could be a db for zingg training data, labels, stop words, synonyms, models, and future api for clis and webapps.... but I think the json file would deliver immediate value with a lot less effort.
The text was updated successfully, but these errors were encountered: