-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[csv] use header option in batch loader #304
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little embarrassed to admit that my recollection of this code is not clear enough to provide definite feedback (I'd need to dig through the details to answer my own questions) so take my comments accordingly.
The primary concern is probably that we support conversion of CSV batches to arrow regardless of whether rows are encoded in array or object format - is that why we now normalize rows to objects?
Not for this PR but I am curious if having the option to keep rows in Array
form even when a header is present would be good for perf... Will add some benchmarks down the road.
let rows = this.rows.slice(0, this.length); | ||
|
||
if (this._headers) { | ||
rows = rows.map(row => convertRowToObject(row, this._headers)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks expensive!
Feels like this should be done only on read...
this.pruneColumns(); | ||
const columns = Array.isArray(this.schema) ? this.columns : {}; | ||
|
||
if (!Array.isArray(this.schema)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if schema is an array?
Is schema
required or can it be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schema
is an array if there is no header row.
That is what the |
ColumnarTableBatch