Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_delim() : unable to retrieve a zipped CSV over HTTP #720

Closed
neveldo opened this issue Oct 4, 2017 · 3 comments
Closed

read_delim() : unable to retrieve a zipped CSV over HTTP #720

neveldo opened this issue Oct 4, 2017 · 3 comments
Labels
bug

Comments

@neveldo
Copy link

@neveldo neveldo commented Oct 4, 2017

Hello,

First of all, thanks a lot for readr !

I think I have found a weird behaviour when trying to retrieve a zipped CSV over HTTP (from data.gouv.fr).

read_delim('http://files.data.gouv.fr/sirene/sirene_2017002_E_Q.zip', ';')

Returns :

Parsed with column specification:
cols(
  `PK��` = col_character()
)
Warning: 1891 parsing failures.
row # A tibble: 5 x 5 col     row          col  expected        actual                                                      file expected   <int>        <chr>     <chr>         <chr>                                                     <chr> actual 1     1 "PK\003\004"           embedded null 'http://files.data.gouv.fr/sirene/sirene_2017002_E_Q.zip' file 2     1         <NA> 1 columns     2 columns 'http://files.data.gouv.fr/sirene/sirene_2017002_E_Q.zip' row 3     2 "PK\003\004"           embedded null 'http://files.data.gouv.fr/sirene/sirene_2017002_E_Q.zip' col 4     3 "PK\003\004"           embedded null 'http://files.data.gouv.fr/sirene/sirene_2017002_E_Q.zip' expected 5     4 "PK\003\004"           embedded null 'http://files.data.gouv.fr/sirene/sirene_2017002_E_Q.zip'
... ................. ... ...................................................................................................... ........ ...................................................... [... truncated]
Error in rep(space, max_width) : argument 'times' incorrect
De plus : Warning message:
In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 1)

And read_file('http://files.data.gouv.fr/sirene/sirene_2017002_E_Q.zip')
returns me a strange file content :

[1] "PK\003\004\n"

If I download the file and then open it from my hard drive (read_delim('~/sirene_2017002_E_Q.zip', ';')), it works fine.

Moreover, I have already open sucessfuly some CSV (non zipped) from data.gouv.fr with read_delim() function.

thanks in advance !

@jimhester jimhester added the bug label Dec 7, 2017
@jimhester
Copy link
Member

@jimhester jimhester commented Dec 7, 2017

Only gz compressed files are supported over connections (using base::gzcon()), not zip archives. So this behavior is expected, you need to download zip files before opening them.

@neveldo
Copy link
Author

@neveldo neveldo commented Dec 7, 2017

Hello @jimhester ,

Thank you for your explanation. However, I think that the documentation should be updated to mention that point, for now, it's seems to be a little bit misleading :

Files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed.

@lock
Copy link

@lock lock bot commented Sep 25, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug
Projects
None yet
Development

No branches or pull requests

2 participants