Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
C++ JavaScript Python CMake R C Other
Latest commit 8872294 Jan 2, 2017 @kevinushey kevinushey committed with [R] tweaks for feather + gcc-4.6 (#280)

README.md

Feather: fast, interoperable data frame storage

Travis-CI Build Status Coverage Status PyPI

Feather provides binary columnar serialization for data frames. It is designed to make reading and writing data frames efficient, and to make sharing data across data analysis languages easy. This initial version comes with bindings for python (written by Wes McKinney) and R (written by Hadley Wickham).

Note to users: Feather should be treated as alpha software. In particular, the file format is likely to evolve over the coming year. Do not use Feather for long-term data storage.

Feather uses the Apache Arrow columnar memory specification to represent binary data on disk. This makes read and write operations very fast. This is particularly important for encoding null/NA values and variable-length types like UTF8 strings.

Feather is complementary to Apache Arrow. Because Arrow does not provide a file format, Feather defines its own schemas and metadata for on-disk representation.

Feather currently supports the following column types:

  • A wide range of numeric types (int8, int16, int32, int64, uint8, uint16, uint32, uint64, float, double).
  • Logical/boolean values.
  • Dates, times, and timestamps.
  • Factors/categorical variables that have fixed set of possible values.
  • UTF-8 encoded strings.
  • Arbitrary binary data.

All column types support NA/null values.

Other Languages

Julia: Feather.jl

License and Copyrights

This library is released under the Apache License, Version 2.0.

See NOTICE for details about the library's copyright holders.