Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Rows not processed when `:remove_empty_hashes => true` and last row has only empty values #18

Closed
jrunning opened this Issue Sep 26, 2013 · 3 comments

Comments

Projects
None yet
2 participants

Say you have a CSV file that has some rows with only empty values, e.g. test.csv:

foo,bar,baz
1,2,3
,,
4,5,6
7,8,9
,,

If you call SmarterCSV.process with :remove_empty_hashes => true (which is the default) the line "7,8,9" won't be processed, i.e.:

puts SmarterCSV.process('test.csv', chunk_size: 2).map(&:inspect)
# => [{:foo=>1, :bar=>2, :baz=>3}, {:foo=>4,:bar=>5,:baz=>6}]

As you can see the line 7,8,9 is never processed (this is true whether you pass SmarterCSV.process a block or not). However with :remove_empty_hashes => false the problem goes away:

puts SmarterCSV.process('test.csv', chunk_size: 2, remove_empty_hashes: false).map(&:inspect)
# => [{:foo=>1,:bar=>2, :baz=>3}, {}]
#    [{:foo=>4,:bar=>5,:baz=>6}, {:foo=>7,:bar=>8,:baz=>9}]
#    [{}]

jrunning added a commit to jrunning/smarter_csv that referenced this issue Sep 26, 2013

Wrote MiniTest specs for tilo/smarter_csv#18
Last chunk isn't processed if its last line has no non-empty values and
`:remove_empty_hashes => true` (which is the default).
Owner

tilo commented Sep 26, 2013

I can not reproduce this - see below:

 $ cat /tmp/test.csv 
 a,b,c
 1,2,3
 ,,
 4,5,6 
 7,8,9
 ,,


# irb
# require 'smarter_csv'
>  data = SmarterCSV.process('/tmp/test.csv')
    => [{:a=>1, :b=>2, :c=>3}, {:a=>4, :b=>5, :c=>6}, {:a=>7, :b=>8, :c=>9}] 

clearly the row with 7,8,9 is showing up.

jrunning pushed a commit to jrunning/smarter_csv that referenced this issue Sep 26, 2013

I was mistaken; the bug doesn't manifest with the default options--it only is in evidence when the :chunk_size option is used:

$ cat > /tmp/test.csv
a,b,c
1,2,3
,,
4,5,6
7,8,9
,,
^D
irb> SmarterCSV.process('/tmp/test.csv')
#=> [{:a=>1, :b=>2, :c=>3}, {:a=>4, :b=>5, :c=>6}, {:a=>7, :b=> 8, :c=>9}]

irb> SmarterCSV.process('/tmp/test.csv', chunk_size: 2)
#=> [{:a=>1, :b=>2, :c=>3}, {:a=>4, :b=>5, :c=>6}]

irb> SmarterCSV.process('/tmp/test.csv', chunk_size: 5)
#=> []

I'll send a pull request shortly.

Owner

tilo commented Sep 28, 2013

I fixed the issue already

@tilo tilo closed this Sep 28, 2013

tilo added a commit that referenced this issue Sep 28, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment