Skip to content

read_lines(fname,skip=xxx) skipping empty lines causes unexpected, backward-incompatible behavior #923

@dan-reznik

Description

@dan-reznik

Let 'file.txt' contain the following 8 lines, with its 4th line empty:

L1
L2
L3
<empty line>
Hello,Weird,Behavior
1,2,3
3,4,5
6,7,8

Calling read_lines() with no skip parameter produces the expected results, with sk0's 4th element being the empty string:

library(tidyverse)
fname <- 'file.txt'
sk0 <- read_lines(fname)
sk0
#> "L1","L2","L3","","Hello,Weird,Behavior","1,2,3","3,4,5,"6,7,8"    

Skipping the first 3 lines also works, now with the 1st element being an empty string:

sk3 <- read_lines(fname,skip=3)
sk3
#> "","Hello,Weird,Behavior","1,2,3","3,4,5","6,7,8"               

However, and here lies the bug, skipping 4 lines misses the 5th line entirely! (in fact skip=4 will miss any sequence of empty lines).

sk4 <- read_lines(fname,skip=4)
sk4
#> "1,2,3","3,4,5","6,7,8"

Though skipping empty lines is a documented feature of this version of readr, it seems inconsistent and unnecessary, and will result in backward incompatibilities.

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] data.table_1.11.8 snakecase_0.9.2   bindrcpp_0.2.2    assertthat_0.2.0  stringi_1.2.4    
 [6] jsonlite_1.5      lubridate_1.7.4   glue_1.3.0        readxl_1.1.0      forcats_0.3.0    
[11] stringr_1.3.1     dplyr_0.7.8       purrr_0.2.5       readr_1.2.1       tidyr_0.8.2      
[16] tibble_1.4.2      ggplot2_3.1.0     tidyverse_1.2.1  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0       cellranger_1.1.0 pillar_1.3.0     compiler_3.5.1   plyr_1.8.4      
 [6] bindr_0.1.1      tools_3.5.1      nlme_3.1-137     gtable_0.2.0     lattice_0.20-38 
[11] pkgconfig_2.0.2  rlang_0.3.0.1    cli_1.0.1        rstudioapi_0.8   yaml_2.2.0      
[16] haven_2.0.0      withr_2.1.2      xml2_1.2.0       httr_1.3.1       hms_0.4.2       
[21] grid_3.5.1       tidyselect_0.2.5 R6_2.3.0         modelr_0.1.2     magrittr_1.5    
[26] backports_1.1.2  scales_1.0.0     rvest_0.3.2      colorspace_1.3-2 lazyeval_0.2.1  
[31] munsell_0.5.0    broom_0.5.0      crayon_1.3.4 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions