Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrates tree_id from an Integer to Unicode(32) #53

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

markroper
Copy link

@markroper markroper commented Jul 18, 2016

The tree_id column is currently an Integer thats managed in application code. New trees are created by querying for the max value in the table and incrementing it by one, which isn't thread safe. An an example, a multithreaded application may try to create two different new trees and assign them both the same tree_id. I've made the tree_id a Unicode(32) so that tree IDs can be generated as GUIDs in application code without the risk of ID collisions.

I like the lib a lot!

@coveralls
Copy link

coveralls commented Jul 18, 2016

Coverage Status

Coverage increased (+0.04%) to 99.559% when pulling 953560b on markroper:master into e351c9c on uralbash:master.

@markroper markroper changed the title Migrates tree_id from an integer to Unicode(32) Migrates tree_id from an Integer to Unicode(32) Jul 18, 2016
@uralbash
Copy link
Owner

@markroper Thank you for your interest in the library.
When I studied this theme for writing libraries I have not found a mathematical basis for mptt together with RDBMS. So I did it as I understand and focusing on the structure of the models in django-mptt. dj-mptt uses integer tree_id field, sa-mptt too, it is necessary not only to separate tree but also for sorting and order trees. Therefore, when the manipulation of top nodes it change tree_id too, see:

If we choose the UUID4 it will take up more space (32-4=28bytes per row) in the database and not allow trees to keep order. I think that is wrong. I think it is possible figure out how to make this field is re-defined for more flixibility of this lib.

On the other hand, but is it necessary?
DB request in the different thread sends in the separate transaction and ACID must ensure the integrity of data. Could you give an example of how to reproduce this problem.

@markroper
Copy link
Author

@uralbash Thanks for the reply! In the use case I have, I do not care about the ordering of the different trees and so maintaining an integer tree id is not a requirement. I understand that you've supported it here as part of the port. Your point about the additional space required for a UUID is a good one, but for my use case, the additional size is not a concern.

The issue I see is on https://github.com/uralbash/sqlalchemy_mptt/blob/master/sqlalchemy_mptt/events.py#L77
Even though the RDBMS has ACID transactions, two threads concurrently querying for [func.max(table.c.tree_id) + 1] to define a new tree ID will resolve the same value and assign that value to the row that each thread goes to insert. One way to prevent this occurrence would be to use a UUID or GUID (at the expense of ordering) as I've shown here. Another would be to create a separate table to maintain a thread safe auto-incrementing value. Something like: https://www.percona.com/blog/2010/10/04/sharing-an-auto_increment-value-across-multiple-mysql-tables/ and to use an insert into this table to resolve new tree_ids.

@uralbash
Copy link
Owner

I would like to save the sort feature because I use it in pyramid_pages module. But the problem with multi threading is also necessary to solve. Do you think if I add table lock on time whether when insert/update top node(tree) it will solve the problem? I mean SELECT FOR UPDATE query when select max tree_id http://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=lock#sqlalchemy.orm.query.Query.with_for_update

@markroper
Copy link
Author

The select for update query will lock for the duration of that select query, but will not lock after that, so I don't think it would eliminate this vulnerability. I think if you want to support this you'll need to create a table in the database that has an auto-incrementing PK. Instead of generating the new PK with [func.max(table.c.tree_id) + 1], you'll have to insert into that table and then select SELECT LAST_INSERT_ID(). http://stackoverflow.com/questions/17112852/get-the-new-record-primary-key-id-from-mysql-insert-query

If you did this, you could still maintain integer-based tree_ids and eliminate the multithreading bug.

@uralbash
Copy link
Owner

Items that must be completed:

  • write test for multithreding
  • add a tree_id type selection option: Integer, ForeignKey, UUID (It's will be more flexible)
  • write docs
  • make 0.2.2 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants