Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Profiling-based improvement to import#values_sql_for_columns_and_attributes - 80% speedup for large imports #45

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
6 participants

saizai commented Feb 10, 2012

https://plus.google.com/u/0/103112149634414554669/posts/EGghnZ1icVY for details.

Three issues corrected:

  1. Calling ActiveRecord::Base#connection within the loop. This may well be an ActiveRecord WTF, but connection actually isn't memoized; every time you call it, it goes back to the connection pool and fetches things over again. Yikes. So I memoized it local to the function.
  2. Calling sequence_name needlessly, for non-id columns. This is a result of doing and in the wrong order. Statements are evaluated left-to-right; this means that you should put the stuff that's cheapest to fail on the left, and the expensive stuff (or the bits whose real return values you want preserved) on the right. So I just swapped the order.
    There's a deeper WTF here, in that sequence_name should be memoized within ActiveRecord, but something was breaking that. Not sure what, and I haven't fixed that, so it'll still be slow if you are using a primary key column. Serves you right for trying to insert explicit primary key values instead of leaving that up to the database at insertion time. :-P
  3. Double-escaping. connection.quote already does appropriate type conversion and escaping; calling connection.type_cast first provides no better security, but slows things down significantly by making things be converted twice. So I just got rid of that.

These three changes resulted in an 80% performance improvement for the import call.

poggs commented on b479882 Feb 18, 2012

Excellent commit - using this, an import that previously took around 2 hours now takes about 40 minutes!

saizai commented Feb 20, 2012

@poggs Damn dood, wtf are you importing? I'm doing 10k records in ~3s. I'm having difficulty imagining what could take that long that would even fit in memory.

Is it the import itself that's taking that long, or some other stuff?

poggs commented Feb 20, 2012

Railway timetable data. It's this function here - https://github.com/poggs/tsdbexplorer/blob/master/lib/tsdbexplorer/cif.rb#L63 - which reads ~500Mb of data, chops it up, pushes it in to a hash called 'pending', and then every 1000 records, imports it.

I haven't really looked at speeding that up, or where it's hideously inefficient :)

Contributor

Empact commented May 2, 2012

FYI I added this change to my fork, published to rubygems as Empact-activerecord-import.

saizai commented May 20, 2012

@Empact Would be nice if I got contributor credit. ;-)

Owner

zdennis commented Dec 14, 2012

@saizai, thank you for your pull request. I believe the first two issues have been resolved as a part of #71. I will look into the third issue.

Contributor

chewi commented May 22, 2014

@zdennis The third issue was also dealt with by #71 so you can close this.

Owner

zdennis commented May 22, 2014

Thanks @chewi

@zdennis zdennis closed this May 22, 2014

@saizai saizai added a commit to saizai/activerecord-import that referenced this pull request Jul 17, 2017

@saizai saizai Crediting copied work c99f3ca

@saizai saizai referenced this pull request Jul 17, 2017

Closed

Crediting copied work #438

@jkowens jkowens reopened this Jul 17, 2017

@jkowens jkowens closed this Jul 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment