Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[csv] use header option in batch loader #304

Merged
merged 2 commits into from
Aug 8, 2019
Merged

Conversation

Pessimistress
Copy link
Collaborator

  • Batch loader and sync loader now outputs consistent results (array if no header, object if header).
  • Fix bugs in ColumnarTableBatch
  • Tests

@coveralls
Copy link

coveralls commented Aug 6, 2019

Coverage Status

Coverage increased (+10.4%) to 61.863% when pulling 9ca51e1 on x/csv-batch-header into 5b80b1f on master.

Copy link
Collaborator

@ibgreen ibgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little embarrassed to admit that my recollection of this code is not clear enough to provide definite feedback (I'd need to dig through the details to answer my own questions) so take my comments accordingly.

The primary concern is probably that we support conversion of CSV batches to arrow regardless of whether rows are encoded in array or object format - is that why we now normalize rows to objects?

Not for this PR but I am curious if having the option to keep rows in Array form even when a header is present would be good for perf... Will add some benchmarks down the road.

let rows = this.rows.slice(0, this.length);

if (this._headers) {
rows = rows.map(row => convertRowToObject(row, this._headers));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks expensive!

Feels like this should be done only on read...

modules/csv/src/csv-loader.js Outdated Show resolved Hide resolved
this.pruneColumns();
const columns = Array.isArray(this.schema) ? this.columns : {};

if (!Array.isArray(this.schema)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if schema is an array?

Is schema required or can it be null?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema is an array if there is no header row.

@Pessimistress
Copy link
Collaborator Author

The primary concern is probably that we support conversion of CSV batches to arrow regardless of whether rows are encoded in array or object format

That is what the ColumnarTableBatch class does

@Pessimistress Pessimistress merged commit 4a8c26d into master Aug 8, 2019
@Pessimistress Pessimistress deleted the x/csv-batch-header branch August 8, 2019 00:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants