Skip to content

readr fails to interpret gzip multibyte string #1125

@cboettig

Description

@cboettig

Given a g'zipped file that does not have a file extension, readr complains that it has detected an invalid multibyte string. Note that in contrast, vroom correctly detects the gzip type:

readr::write_tsv(mtcars, "mtcars.tsv.gz")
fs::file_copy("mtcars.tsv.gz", "mtcars")
# works
vroom::vroom("mtcars")

# error invalid multi-byte string
readr::read_tsv("mtcars")

# works once we manually use gzfile, ick
readr::read_tsv(gzfile("mtcars"))

## works with extension from file name
readr:::read_tsv("mtcars.tsv.gz")

My understanding is that it would be best for readr to identify the compression from the multibyte string instead of relying on the convention of a filename extension.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions