You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are times where it's useful for the chunked callback to return a list (that is combined into a single list at the end).
An example is when processing a large JSONL file with purrr functions: (1) read file chunk, (2) use purrr to transform, filter, reduce, etc. as needed, and (3) combine into a final list, call that final object my_data. My specific use-case is to filter the data to discard irrelevant data such that the final object can fit into memory for interactive work/exploration.
When downstream functions expect my_data to be a data frame, then DataFrameCallback works just fine.
But if there already exist some downstream functions written to deal with my_data as a list (e.g. prepared purrr-enabled pipelines), the DataFrameCallback must be written to return a single list-column data frame, then extract that single list-column, and finally pass the now-extracted list to the downstream functions.
It seems silly to return a single-column data frame, just to then extract that single column later, when we know we'd like have a list in-hand at the end of the chunked reading/processing phase.
Minor suggestion, but certainly not a blocker/bug/etc. (since the work-around is described above).
In all other aspects, readr's great!
The text was updated successfully, but these errors were encountered:
Related to (PR) #520.
There are times where it's useful for the chunked callback to return a list (that is combined into a single list at the end).
An example is when processing a large JSONL file with purrr functions: (1) read file chunk, (2) use purrr to transform, filter, reduce, etc. as needed, and (3) combine into a final list, call that final object
my_data
.My specific use-case is to filter the data to discard irrelevant data such that the final object can fit into memory for interactive work/exploration.
When downstream functions expect
my_data
to be a data frame, then DataFrameCallback works just fine.But if there already exist some downstream functions written to deal with
my_data
as a list (e.g. prepared purrr-enabled pipelines), the DataFrameCallback must be written to return a single list-column data frame, then extract that single list-column, and finally pass the now-extracted list to the downstream functions.It seems silly to return a single-column data frame, just to then extract that single column later, when we know we'd like have a list in-hand at the end of the chunked reading/processing phase.
Minor suggestion, but certainly not a blocker/bug/etc. (since the work-around is described above).
In all other aspects, readr's great!
The text was updated successfully, but these errors were encountered: