This crate provides a fast streaming CSV (comma separated values) writer and
reader that works with the serialize
crate to do type based encoding
and decoding. There are two primary goals of this project:
- The default mode of parsing should just work. This means the parser will bias toward providing a parse over a correct parse (with respect to RFC 4180).
- Convenient to use by default, but when performance is needed, the API will provide an escape hatch.
There is evidence of this parser's performance at the bottom of this README. You can also see how it compares to other parsers in ewanhiggs' CSV game.
Dual-licensed under MIT or the UNLICENSE.
The API is fully documented with lots of examples: http://burntsushi.net/rustdoc/csv/.
Here is a full working Rust program that decodes records from a CSV file. Each record consists of two strings and an integer (the edit distance between the strings):
extern crate csv;
fn main() {
let mut rdr = csv::Reader::from_file("./data/simple.csv").unwrap();
for record in rdr.decode() {
let (s1, s2, dist): (String, String, usize) = record.unwrap();
println!("({}, {}): {}", s1, s2, dist);
}
}
Don't like tuples? That's fine. Use a struct instead:
extern crate csv;
extern crate rustc_serialize;
#[derive(RustcDecodable)]
struct Record {
s1: String,
s2: String,
dist: u32,
}
fn main() {
let mut rdr = csv::Reader::from_file("./data/simple.csv").unwrap();
for record in rdr.decode() {
let record: Record = record.unwrap();
println!("({}, {}): {}", record.s1, record.s2, record.dist);
}
}
Do some records not have a distance for some reason? Use an Option
type!
#[derive(RustcDecodable)]
struct Record {
s1: String,
s2: String,
dist: Option<u32>,
}
You can also read CSV headers, change the delimiter, use enum
types or just
get plain access to records as vectors of strings. There are examples with more
details in the documentation.
This crate works with Cargo and is on
crates.io. The package is regularly updated.
Add it to your Cargo.toml
like so:
[dependencies]
csv = "0.14"
I claim that this is one of the fastest CSV parsers out there. Its speed should
be comparable or better than
libcsv
while providing a more convenient and safer interface. At the lowest level, the
parser can decode CSV at about 200 MB/sec. Here are some rough benchmarks:
raw ... bench: 5627467 ns/iter (+/- 171958) = 241 MB/s
byte ... bench: 9307428 ns/iter (+/- 473205) = 146 MB/s
string ... bench: 11043921 ns/iter (+/- 55845) = 122 MB/s
decoded ... bench: 16150376 ns/iter (+/- 496846) = 83 MB/s
raw
corresponds to the zero allocation parser. Namely, no allocations are
made for each field or row. For example, this is the fastest way to compute the
number of records in a CSV file:
extern crate csv;
fn main() {
let fpath = ::std::env::args().nth(1).unwrap();
let mut rdr = csv::Reader::from_file(fpath).unwrap();
let mut count = 0;
loop {
match rdr.next_bytes() {
NextField::EndOfCsv => break,
NextField::EndOfRecord => { count += 1; break; }
NextField::Data(_) => {}
NextField::Error(err) => fail!(err),
}
}
println!("{}", count);
}
byte
corresponds to allocating a fresh byte string for each field and a fresh
vector for each row. This is more convenient than using the raw
API:
extern crate csv;
fn main() {
let fpath = ::std::env::args().nth(1).unwrap();
let mut rdr = csv::Reader::from_file(fpath).unwrap();
let mut count = 0;
for record in rdr.byte_records().map(|r| r.unwrap()) {
count += 1;
}
println!("{}", count);
}
string
is just like byte
, except each field is decoded from UTF-8 into a
Unicode string. It's exactly like above, except one uses records
instead of
byte_records
.
decoded
is the slowest approach but also the most convenient if your CSV
contains data other than plain strings, like numbers or booleans.
This library also includes simplistic CSV indexing support. Once a CSV index is created, you can use it to jump to any record in the data instantly. In essence, it gives you random access for a modest upfront cost in time and memory.
This example shows how to create an in-memory index and use it to jump to
any record in the data. (The indexing interface works with seekable readers
and writers, so you can use std::fs::File
for this too.)
extern crate csv;
use std::io::{self, Write};
use csv::index::{Indexed, create_index};
fn main() {
let data = "
h1,h2,h3
a,b,c
d,e,f
g,h,i
";
let new_csv_rdr = || csv::Reader::from_string(data);
let mut index_data = io::Cursor::new(Vec::new());
create_index(new_csv_rdr(), index_data.by_ref()).unwrap();
let mut index = Indexed::open(new_csv_rdr(), index_data).unwrap();
// Seek to the second record and read its data. This is done *without*
// reading the first record.
index.seek(1).unwrap();
// Read the first row at this position (which is the second record).
// Since `Indexed` derefs to a `csv::Reader`, we can call CSV reader methods
// on it directly.
let row = index.records().next().unwrap().unwrap();
assert_eq!(row, vec!["d", "e", "f"]);
}