Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return the inserted ids for postgres bulk insert #178

Merged
merged 13 commits into from Apr 21, 2015

Conversation

johnnaegle
Copy link
Contributor

per #104 and #105, I've rebased the commits onto the latest zdennis master and got the adapter tests working for mysql, postgres and sqlite3.

insert_many now returns the number of imports and the ids of the objects inserted (if the adapter provides them). Those ids are then set on the model and the model is marked as clean. For auto-save associations, a recursive bulk import is performed. There are postgresql adapter specific tests that show the ids getting set and recursive import of autosave associations.

end
end

def add_objects(hash, parent)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to hide this method and import_helper in the Import module. Right now it looks like it's being added to ActiveRecord::Base as a class method. We should only add the public methods that we want to expose to a user there so we don't unnecessarily pollute or add potential conflicts w/ other things on ActiveRecord::Base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look tomorrow

On Sun, Feb 22, 2015 at 12:42 PM, Zach Dennis notifications@github.com
wrote:

In lib/activerecord-import/import.rb
#178 (comment)
:

  •    #    should probably take a hash to associations to follow.
    
  •    hash={}
    
  •    models.each {|model| add_objects(hash, model) }
    
  •    hash.each_pair do |class_name, assocs|
    
  •      assocs.each_pair do |assoc_name, subobjects|
    
  •        subobjects.first.class.import(subobjects, options) unless subobjects.empty?
    
  •      end
    
  •    end
    
  •    result
    
  •  else
    
  •    import_helper(*args)
    
  •  end
    
  • end
  • def add_objects(hash, parent)

We'll want to hide this method and import_helper in the Import module.
Right now it looks like it's being added to ActiveRecord::Base as a class
method. We should only add the public methods that we want to expose to a
user there so we don't unnecessarily pollute or add potential conflicts w/
other things on ActiveRecord::Base.


Reply to this email directly or view it on GitHub
https://github.com/zdennis/activerecord-import/pull/178/files#r25135051.

@johnnaegle johnnaegle force-pushed the set_ids_in_bulk_import branch 3 times, most recently from ad422b6 to fd244d9 Compare February 23, 2015 04:15
place the id return into every model and mark the model as clean? (since it has been saved)
make the patch work for non-standard primary keys
modification showing how save a tree of objects efficiently.
pass options down into sub-objects that are imported
set the foreign key on a sub-collection when the parent is imported
tests for importing 3 levels of objects
new model for testing 3 levels of objects: chapter
move the setting of ids down into the postgresql adapter.
@alainmeier
Copy link

👍

@johnnaegle
Copy link
Contributor Author

if you have been using the GoodMeasuresLLC fork, you'll need to add :recursive => true if you want to import autosave associations with the root objects

@johnnaegle
Copy link
Contributor Author

@zdennis - thoughts?

@alainmeier
Copy link

I have tested this on millions of records and it's incredibly useful. Would love to see it in the main project.

@zdennis
Copy link
Owner

zdennis commented Mar 2, 2015

@johnnaegle @alainmeier, this is on the path to being merged. I will review the changes again in the next few days. If no other questions or suggestions then we'll get this in very soon. Sorry for the wait and thank you for your patience and work on this.

@robmathews
Copy link
Contributor

@zdennis, that will be awesome!

@timadler
Copy link

timadler commented Mar 5, 2015

Hi @robmathews and @johnnaegle

I am running some tests on this branch - works great but I am noticing in the logs that in my test case a number of select queries between the bulk insert of the parent objects and the bulk inserts of the child objects. I was wondering why these are necessary? I haven't run the tests you have checked in, but I expect that give your test case of 6 books, you would see 6 selects on table chapters after the bulk book insert but before the bulk chapters insert Why are these necessary?

Or perhaps I haven't done set something correctly in my case..

Thanks for all your work on this,

Tim.

@johnnaegle
Copy link
Contributor Author

I see this in the logs as well, curious

This is from the test case:

--------------------------------------------------------------------------------
   (0.2ms)  SELECT COUNT(*) FROM "chapters" 
  Class Create Many Without Validations Or Callbacks (0.3ms)  INSERT INTO "topics" ("id","title","author_name","author_email_address","written_on","bonus_time","last_read","content","approved","replies_count","parent_id","type","created_at","created_on","updated_at","updated_on") VALUES (nextval('topics_id_seq'),'Title 24','Author 24',NULL,NULL,NULL,NULL,NULL,'t',NULL,NULL,NULL,'2015-03-05 18:14:02.536744','2015-03-05 18:14:02.536730','2015-03-05 18:14:02.536782','2015-03-05 18:14:02.536751'),(nextval('topics_id_seq'),'Title 25','Author 25',NULL,NULL,NULL,NULL,NULL,'t',NULL,NULL,NULL,'2015-03-05 18:14:02.536744','2015-03-05 18:14:02.536730','2015-03-05 18:14:02.536782','2015-03-05 18:14:02.536751'),(nextval('topics_id_seq'),'Title 26','Author 26',NULL,NULL,NULL,NULL,NULL,'t',NULL,NULL,NULL,'2015-03-05 18:14:02.536744','2015-03-05 18:14:02.536730','2015-03-05 18:14:02.536782','2015-03-05 18:14:02.536751') RETURNING id
  Book Load (0.2ms)  SELECT "books".* FROM "books" WHERE "books"."topic_id" IS NULL
  Book Load (0.2ms)  SELECT "books".* FROM "books" WHERE "books"."topic_id" IS NULL
  Book Load (0.2ms)  SELECT "books".* FROM "books" WHERE "books"."topic_id" IS NULL
  Class Create Many Without Validations Or Callbacks (0.3ms)  INSERT INTO "books" ("id","title","publisher","author_name","created_at","created_on","updated_at","updated_on","publish_date","topic_id","for_sale") VALUES (nextval('books_id_seq'),'Book 7','Default Publisher','Stephen King','2015-03-05 18:14:02.539747','2015-03-05 18:14:02.539737','2015-03-05 18:14:02.539776','2015-03-05 18:14:02.539753',NULL,25,'t'),(nextval('books_id_seq'),'Book 8','Default Publisher','Stephen King','2015-03-05 18:14:02.539747','2015-03-05 18:14:02.539737','2015-03-05 18:14:02.539776','2015-03-05 18:14:02.539753',NULL,25,'t'),(nextval('books_id_seq'),'Book 9','Default Publisher','Stephen King','2015-03-05 18:14:02.539747','2015-03-05 18:14:02.539737','2015-03-05 18:14:02.539776','2015-03-05 18:14:02.539753',NULL,26,'t'),(nextval('books_id_seq'),'Book 10','Default Publisher','Stephen King','2015-03-05 18:14:02.539747','2015-03-05 18:14:02.539737','2015-03-05 18:14:02.539776','2015-03-05 18:14:02.539753',NULL,26,'t'),(nextval('books_id_seq'),'Book 11','Default Publisher','Stephen King','2015-03-05 18:14:02.539747','2015-03-05 18:14:02.539737','2015-03-05 18:14:02.539776','2015-03-05 18:14:02.539753',NULL,27,'t'),(nextval('books_id_seq'),'Book 12','Default Publisher','Stephen King','2015-03-05 18:14:02.539747','2015-03-05 18:14:02.539737','2015-03-05 18:14:02.539776','2015-03-05 18:14:02.539753',NULL,27,'t') RETURNING id
  Chapter Load (0.2ms)  SELECT "chapters".* FROM "chapters" WHERE "chapters"."book_id" IS NULL
  Chapter Load (0.2ms)  SELECT "chapters".* FROM "chapters" WHERE "chapters"."book_id" IS NULL
  Chapter Load (0.2ms)  SELECT "chapters".* FROM "chapters" WHERE "chapters"."book_id" IS NULL
  Chapter Load (0.1ms)  SELECT "chapters".* FROM "chapters" WHERE "chapters"."book_id" IS NULL
  Chapter Load (0.1ms)  SELECT "chapters".* FROM "chapters" WHERE "chapters"."book_id" IS NULL
  Chapter Load (0.2ms)  SELECT "chapters".* FROM "chapters" WHERE "chapters"."book_id" IS NULL
  Class Create Many Without Validations Or Callbacks (0.4ms)  INSERT INTO "chapters" ("id","title","book_id","created_at","updated_at") VALUES (nextval('chapters_id_seq'),'Chapter 19',27,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 20',27,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 21',27,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 22',28,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 23',28,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 24',28,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 25',29,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 26',29,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 27',29,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 28',30,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 29',30,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 30',30,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 31',31,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 32',31,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 33',31,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 34',32,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 35',32,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439'),(nextval('chapters_id_seq'),'Chapter 36',32,'2015-03-05 18:14:02.544426','2015-03-05 18:14:02.544439') RETURNING id
   (0.2ms)  SELECT COUNT(*) FROM "chapters" 
--------------------------------------------------------------------------------

…ulk import, we want to bypass the proxy association and use the underlying target. Do this by marking each association as loaded
@johnnaegle
Copy link
Contributor Author

I've marked all the associations that get recursed on as loaded. This prevents a lot of extra queries. Great catch @timadler.

@timadler
Copy link

timadler commented Mar 5, 2015

Wow - thanks @johnnaegle. Fast turn around! Works great.

…lling some SQL. I think pulling the autofollowing of associations out helps that
@johnnaegle
Copy link
Contributor Author

@zdennis - thoughts?


That would be about 4M SQL insert statements vs 3, which results in vastly improved performance. In our case, it converted
an 18 hour batch process to <2 hrs.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this. I've created a local copy of the branch and have formatted and massaged this a little bit, but it's mostly intact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey thanks - I noticed you have stuff over in the wiki section, but it is important that folks realize why you bothered to write this in the first place.

This is important stuff. I think it should really make its way into the rails core ....

@robmathews
Copy link
Contributor

zach, thanks so much for looking at this. we've been wanting to get back on the main branch for so long....

…de do its job.

This removes the chance of name collision and ensures we can call the original method with super.
@johnnaegle
Copy link
Contributor Author

I've incorporated @zdennis's changes by cherry-picking your commits into this branch. I made a few renames that hoepfully make things clearer. I have one PR that I wanted @robmathews to review before I merge it in, but I think it simplifies the importing of associations: GoodMeasuresLLC#1

@chewi
Copy link
Contributor

chewi commented Mar 27, 2015

Looking good here. I've been waiting for this a long time but we really need it now so I gave it a try. I take it this depends on PostgreSQL returning the ids in the same order the rows were given. There has been some debate over whether this is guaranteed. The developers refuse to be held on it but it seems doubtful that they will be able to change it now given how many people rely on it. Many have said, and I agree, that the returning feature isn't very useful unless the order is maintained. You're probably aware of this but I just want to make sure.

@robmathews
Copy link
Contributor

I've relied on the order of the returned ids being the same for over 15 yrs
between SQL Server and Postgres. I can't see this realistically being a
problem.

On Fri, Mar 27, 2015 at 11:52 AM, James Le Cuirot notifications@github.com
wrote:

Looking good here. I've been waiting for this a long time but we really
need it now so I gave it a try. I take it this depends on PostgreSQL
returning the ids in the same order the rows were given. There has been
some debate over whether this is guaranteed. The developers refuse to be
held on it but it seems doubtful that they will be able to change it now
given how many people rely on it. Many have said, and I agree, that the
returning feature isn't very useful unless the order is maintained. You're
probably aware of this but I just want to make sure.


Reply to this email directly or view it on GitHub
#178 (comment)
.

@johnnaegle
Copy link
Contributor Author

@zdennis thoughts about getting this merged in? Thanks!

@sebastianvera
Copy link

@johnnaegle +1

@warmwaffles
Copy link

bump

@stevenspiel
Copy link

👍

@zdennis
Copy link
Owner

zdennis commented Apr 21, 2015

@alainmeier, thanks for posting. @johnnaegle / @robmathews , any thoughts?

@johnnaegle
Copy link
Contributor Author

I reverted a change and added a test for multiple has_many associations on a single model.

@zdennis
Copy link
Owner

zdennis commented Apr 21, 2015

@alainmeier, would you be able to update and run your code with @johnnaegle's latest changes?

@alainmeier
Copy link

💥 everything works great again. Thanks @johnnaegle

@zdennis
Copy link
Owner

zdennis commented Apr 21, 2015

Awesome! Okay, I think it's about time we get this in there.

zdennis added a commit that referenced this pull request Apr 21, 2015
return the inserted ids for postgres bulk insert
@zdennis zdennis merged commit e7adac0 into zdennis:master Apr 21, 2015
@zdennis
Copy link
Owner

zdennis commented Apr 21, 2015

Thank you to @johnnaegle, @robmathews, and everyone that contributed this work. I know it's been a long time coming.

@stevenspiel
Copy link

Thank you all! This is a great feature 👍

@alainmeier
Copy link

Thanks everyone! @zdennis will you be releasing a new RubyGems version soon by any chance?

@zdennis
Copy link
Owner

zdennis commented Apr 22, 2015

@alainmeier, yeah, tonight or tomorrow night.

@mockdeep
Copy link

Very useful, and very timely. We were just looking into how to import a many-to-many relationship and this will help a lot.

@mdemare
Copy link

mdemare commented May 15, 2015

@zdennis Could you release a new gem version? It would be much appreciated!

@robmathews
Copy link
Contributor

He accepted the pull request a few weeks back.

On Fri, May 15, 2015 at 7:44 AM, Michiel de Mare notifications@github.com
wrote:

@zdennis https://github.com/zdennis Could you release a new gem
version? It would be much appreciated!


Reply to this email directly or view it on GitHub
#178 (comment)
.

@zdennis
Copy link
Owner

zdennis commented May 18, 2015

Sorry for the wait. A new gem (0.8.0) has been published just now. This is included in that.

@stevenspiel
Copy link

👍

@mdemare
Copy link

mdemare commented May 18, 2015

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet