readr::tokenize_melt("1,as\n3.3,2017-01-03,5\n\n6") #> # A tibble: 6 x 4 #> row col value data_type #> <int> <int> <chr> <chr> #> 1 0 0 1 integer #> 2 0 1 as character #> 3 1 0 3.3 double #> 4 1 1 2017-01-03 date #> 5 1 2 5 integer #> 6 2 0 6 integer
This form is useful when the source is ragged and/or has mixed data types within columns. Once it has been read into R in this form, it can be traversed and transformed into something tidier.
The readr purpose is 'Read Rectangular Text Data', but since the implementation would be a thin wrapper around existing C++ code, it would seem to belong here. Otherwise would it be possible to use the readr headers from another package?
The text was updated successfully, but these errors were encountered:
Thanks for opening the PR! I cleaned up some things and I think the interface makes more sense as a series of
If you want to implement the remaining functions and add some tests this could make it into the next release (which is imminent). Otherwise I will do so at a later date, I generally want this release to focus only on bug fixes and save new features for the next one.
Thanks for reviewing and tidying up my code! Making a series of
I'll keep pushing to this branch and update this comment with progress so far.
The `read_*` functions skip empty rows altogether, and still will by default with this commit. The new `melt_*` functions in the next commit will *not* skip empty rows by default, because they can be meaningful when the data is less regular, e.g. more than one table per file, separated by blank lines.
There are now
melt_delim() melt_csv() melt_csv2() melt_tsv() melt_delim_chunked() melt_csv_chunked() melt_csv2_chunked() melt_tsv_chunked() melt_fwf() melt_table() melt_table2()
read_file() read_file_raw() read_lines() read_lines_chunked() read_lines_raw() read_log() read_rds()
The implementation is almost the same as the
At the C++ level, where
The engine is
Row and column numbers are returned in doubles rather than integers to support long vectors. It looks odd when printed in a tibble, so I'm open to suggestions -- maybe attempt conversion to integer if possible?