Skip to content

Commit

Permalink
bugfix; new features; value_converters; 1.1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
tilo committed Jul 27, 2015
1 parent 49e6629 commit 0b4a1fc
Show file tree
Hide file tree
Showing 8 changed files with 165 additions and 7 deletions.
51 changes: 49 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# SmarterCSV
# SmarterCSV

[![Build Status](https://secure.travis-ci.org/tilo/smarter_csv.png?branch=master)](http://travis-ci.org/tilo/smarter_csv) [![Gem Version](https://badge.fury.io/rb/smarter_csv.svg)](http://badge.fury.io/rb/smarter_csv)

Expand Down Expand Up @@ -34,7 +34,7 @@ The two main choices you have in terms of how to call `SmarterCSV.process` are:
* calling `process` with or without a block
* passing a `:chunk_size` to the `process` method, and processing the CSV-file in chunks, rather than in one piece.

Tip: If you are uncertain about what line endings a CSV-file uses, try specifying `:row_sep => :auto` as part of the options.
Tip: If you are uncertain about what line endings a CSV-file uses, try specifying `:row_sep => :auto` as part of the options.
But this could be slow, because it will try to analyze each CSV file first. If you want to speed things up, set the `:row_sep` manually! Checkout Example 5 for unusual `:row_sep` and `:col_sep`.

#### Example 1a: How SmarterCSV processes CSV-files as array of hashes:
Expand Down Expand Up @@ -128,6 +128,40 @@ and how the `process` method returns the number of chunks when called with a blo
end
=> returns number of chunks

#### Example 6: Using Value Converters

$ cat spec/fixtures/with_dates.csv
first,last,date,price
Ben,Miller,10/30/1998,$44.50
Tom,Turner,2/1/2011,$15.99
Ken,Smith,01/09/2013,$199.99
$ irb
> require 'smarter_csv'
> require 'date'

# define a custom converter class, which implements self.convert(value)
class DateConverter
def self.convert(value)
Date.strptime( value, '%m/%d/%Y') # parses custom date format into Date instance
end
end

class DollarConverter
def self.convert(value)
value.sub('$','').to_f
end
end

options = {:value_converters => {:date => DateConverter, :price => DollarConverter}}
data = SmarterCSV.process("spec/fixtures/with_dates.csv", options)
data[0][:date]
=> #<Date: 1998-10-30 ((2451117j,0s,0n),+0s,2299161j)>
data[0][:date].class
=> Date
data[0][:price]
=> 44.50
data[0][:price].class
=> Float

## Documentation

Expand Down Expand Up @@ -165,6 +199,7 @@ The options and the block are optional.
| | | Important if the file does not contain headers, |
| | | otherwise you would lose the first line of data. |
---------------------------------------------------------------------------------------------------------------------------------
| :value_converters | nil | supply a hash of :header => KlassName; the class needs to implement self.convert(val)|
| :remove_empty_values | true | remove values which have nil or empty strings as values |
| :remove_zero_values | true | remove values which have a numeric value equal to zero / 0 |
| :remove_values_matching | nil | removes key/value pairs if value matches given regular expressions. e.g.: |
Expand Down Expand Up @@ -238,6 +273,12 @@ Or install it yourself as:

## Changes

#### 1.1.0 (2015-07-26)
* added feature :value_converters, which allows parsing of dates, money, and other things (thanks to Raphaël Bleuse, Lucas Camargo de Almeida, Alejandro)
* added error if :headers_in_file is set to false, and no :user_provided_headers are given (thanks to innhyu)
* added support to convert dashes to underscore characters in headers (thanks to César Camacho)
* fixing automatic detection of \r\n line-endings (thanks to feens)

#### 1.0.19 (2014-10-29)
* added option :keep_original_headers to keep CSV-headers as-is (thanks to Benjamin Thouret)

Expand Down Expand Up @@ -342,6 +383,12 @@ Please [open an Issue on GitHub](https://github.com/tilo/smarter_csv/issues) if
Many thanks to people who have filed issues and sent comments.
And a special thanks to those who contributed pull requests:

* [Alejandro](https://github.com/agaviria)
* [Lucas Camargo de Almeida](https://github.com/lcalmeida)
* [Raphaël Bleuse](https://github.com/bleuse)
* [feens](https://github.com/feens)
* [César Camacho](https://github.com/chanko)
* [innhyu](https://github.com/innhyu)
* [Benjamin Thouret](https://github.com/benichu)
* [Chris Hilton](https://github.com/chrismhilton)
* [Sean Duckett](http://github.com/sduckett)
Expand Down
31 changes: 27 additions & 4 deletions lib/smarter_csv/smarter_csv.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ def SmarterCSV.process(input, options={}, &block) # first parameter: filename
:remove_empty_values => true, :remove_zero_values => false , :remove_values_matching => nil , :remove_empty_hashes => true , :strip_whitespace => true,
:convert_values_to_numeric => true, :strip_chars_from_headers => nil , :user_provided_headers => nil , :headers_in_file => true,
:comment_regexp => /^#/, :chunk_size => nil , :key_mapping_hash => nil , :downcase_header => true, :strings_as_keys => false, :file_encoding => 'utf-8',
:remove_unmapped_keys => false, :keep_original_headers => false,
:remove_unmapped_keys => false, :keep_original_headers => false, :value_converters => nil,
}
options = default_options.merge(options)
csv_options = options.select{|k,v| [:col_sep, :row_sep, :quote_char].include?(k)} # options.slice(:col_sep, :row_sep, :quote_char)
Expand Down Expand Up @@ -40,13 +40,15 @@ def SmarterCSV.process(input, options={}, &block) # first parameter: filename
file_headerA.map!{|x| x.gsub(%r/options[:quote_char]/,'') }
file_headerA.map!{|x| x.strip} if options[:strip_whitespace]
unless options[:keep_original_headers]
file_headerA.map!{|x| x.gsub(/\s+/,'_')}
file_headerA.map!{|x| x.gsub(/\s+|-+/,'_')}
file_headerA.map!{|x| x.downcase } if options[:downcase_header]
end

# puts "HeaderA: #{file_headerA.join(' , ')}" if options[:verbose]

file_header_size = file_headerA.size
else
raise SmarterCSV::IncorrectOption , "ERROR [smarter_csv]: If :headers_in_file is set to false, you have to provide :user_provided_headers" if ! options.keys.include?(:user_provided_headers)
end
if options[:user_provided_headers] && options[:user_provided_headers].class == Array && ! options[:user_provided_headers].empty?
# use user-provided headers
Expand Down Expand Up @@ -135,6 +137,15 @@ def SmarterCSV.process(input, options={}, &block) # first parameter: filename
end
end
end

if options[:value_converters]
hash.each do |k,v|
converter = options[:value_converters][k]
next unless converter
hash[k] = converter.convert(v)
end
end

next if hash.empty? if options[:remove_empty_hashes]

if use_chunks
Expand Down Expand Up @@ -212,11 +223,23 @@ def self.guess_line_ending( filehandle, options )

# count how many of the pre-defined line-endings we find
# ignoring those contained within quote characters
last_char = nil
filehandle.each_char do |c|
quoted_char = !quoted_char if c == options[:quote_char]
next if quoted_char || c !~ /\r|\n|\r\n/
counts[c] += 1
next if quoted_char

if last_char == "\r"
if c == "\n"
counts["\r\n"] += 1
else
counts["\r"] += 1 # \r are counted after they appeared, we might
end
elsif c == "\n"
counts["\n"] += 1
end
last_char = c
end
counts["\r"] += 1 if last_char == "\r"
# find the key/value pair with the largest counter:
k,v = counts.max_by{|k,v| v}
return k # the most frequent one is it
Expand Down
2 changes: 1 addition & 1 deletion lib/smarter_csv/version.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
module SmarterCSV
VERSION = "1.0.19"
VERSION = "1.1.0"
end
3 changes: 3 additions & 0 deletions spec/fixtures/money.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
item,price
Book,$9.99
Mug,$14.99
8 changes: 8 additions & 0 deletions spec/fixtures/with_dashes.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
First-Name,Last-Name,Dogs,Cats,Birds,Fish
Dan,McAllister,2,0,,
Lucy,Laweless,,5,0,
,,,,,
Miles,O'Brian,0,0,0,21
Nancy,Homes,2,0,1,
Hernán,Curaçon,3,0,0,
,,,,,
4 changes: 4 additions & 0 deletions spec/fixtures/with_dates.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
first,last,date,price
Ben,Miller,10/30/1998,$44.50
Tom,Turner,2/1/2011,€15
Ken,Smith,01/09/2013,$0.11
21 changes: 21 additions & 0 deletions spec/smarter_csv/header_transformation_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
require 'spec_helper'

fixture_path = 'spec/fixtures'

describe 'be_able_to' do
it 'loads_file_with_dashes_in_header_fields as strings' do
options = {:strings_as_keys => true}
data = SmarterCSV.process("#{fixture_path}/with_dashes.csv", options)
data.flatten.size.should == 5
data[0]['first_name'].should eq 'Dan'
data[0]['last_name'].should eq 'McAllister'
end

it 'loads_file_with_dashes_in_header_fields as symbols' do
options = {:strings_as_keys => false}
data = SmarterCSV.process("#{fixture_path}/with_dashes.csv", options)
data.flatten.size.should == 5
data[0][:first_name].should eq 'Dan'
data[0][:last_name].should eq 'McAllister'
end
end
52 changes: 52 additions & 0 deletions spec/smarter_csv/value_converters_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
require 'spec_helper'

fixture_path = 'spec/fixtures'

require 'date'
class DateConverter
def self.convert(value)
Date.strptime( value, '%m/%d/%Y')
end
end

class CurrencyConverter
def self.convert(value)
value.sub(/[$€]/,'').to_f # would be nice to add a computed column :currency => '€'
end
end

describe 'be_able_to' do
it 'convert date values into Date instances' do
options = {:value_converters => {:date => DateConverter}}
data = SmarterCSV.process("#{fixture_path}/with_dates.csv", options)
data.flatten.size.should == 3
data[0][:date].class.should eq Date
data[0][:date].to_s.should eq "1998-10-30"
data[1][:date].to_s.should eq "2011-02-01"
data[2][:date].to_s.should eq "2013-01-09"
end

it 'converts dollar prices into float values' do
options = {:value_converters => {:price => CurrencyConverter}}
data = SmarterCSV.process("#{fixture_path}/money.csv", options)
data.flatten.size.should == 2
data[0][:price].class.should eq Float
data[0][:price].should eq 9.99
data[1][:price].should eq 14.99
end

it 'convert can use multiple value converters' do
options = {:value_converters => {:date => DateConverter, :price => CurrencyConverter}}
data = SmarterCSV.process("#{fixture_path}/with_dates.csv", options)
data.flatten.size.should == 3
data[0][:date].class.should eq Date
data[0][:date].to_s.should eq "1998-10-30"
data[1][:date].to_s.should eq "2011-02-01"
data[2][:date].to_s.should eq "2013-01-09"

data[0][:price].class.should eq Float
data[0][:price].should eq 44.50
data[1][:price].should eq 15.0
data[2][:price].should eq 0.11
end
end

0 comments on commit 0b4a1fc

Please sign in to comment.