Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with collision check in merge_tablemaps #518

Closed
nursix opened this issue Feb 9, 2018 · 37 comments
Closed

Problem with collision check in merge_tablemaps #518

nursix opened this issue Feb 9, 2018 · 37 comments

Comments

@nursix
Copy link
Contributor

nursix commented Feb 9, 2018

I'm struggling with this:
https://github.com/web2py/pydal/blob/master/pydal/helpers/methods.py#L83

...as it requires identity of aliased Table instances, and hence fails if the same aliasing happens twice, i.e.:

table.with_alias("alias") is not table.with_alias("alias")

...even though it's the same table, and the same alias, and consequently the same SQL.

Why does it have to be this way - wouldn't it be good enough to ensure that the DAL name of both tables is the same (i.e. that they refer to the same original table)? Do they really have to be exactly the same Python object?

@nursix
Copy link
Contributor Author

nursix commented Feb 9, 2018

Or actually,

shouldn't table.with_alias("alias") deliver the same instance for the same table and same alias?

Why copy.copy again if db[alias] already exists for the same original Table instance?

@BuhtigithuB
Copy link
Contributor

Could this be related somehow with #521 ??

@BuhtigithuB
Copy link
Contributor

Or says differently should we make sure that each time we pass a second (dict) argument to merge_tablemaps() we make sure that we make a "deepcopy"?? Since we can't control that the second argument get update (modified), I think it would make more sens...

@BuhtigithuB
Copy link
Contributor

@nursix, can you provide a failing example for me to check if this would fix your issue :
https://groups.google.com/forum/#!topic/web2py-developers/5v1Orm93_3U

Thanks

@nursix
Copy link
Contributor Author

nursix commented Feb 28, 2018

Ah - okay, I should probably have described the problem better.

I have two functions - the first one produces a Query, like:

query = db.my_table.with_alias("my_aliased_table").some_field == "example"

...and passes this query on to another function, like:

result = other_function(query)

Then, that other_function performs a select from a left join that includes the same aliased table:

left = [db.my_table.with_alias("my_aliased_table").on(...join expression...), ...other join...]
rows = db(query & ...other_query).select(..., left=left)

Now, this raises a "Name Conflict" error - because there seem to be two tables "my_aliased_table".

Obviously, this is exactly the same table with exactly the same alias (so perfectly the same resulting SQL), but merge_tablemaps doesn't recognize that because table.with_alias() produces a new instance of the Table every time it is called, and merge_tablemaps checks for object identity.

As I see it, with_alias stores the aliased table instance in the DAL instance (self._db), so it should be able to detect when it is called again for the same table with the same alias, and in that case return the same instance from self._db.

I can work around the issue by wrapping table.with_alias in a function that does exactly that: checking whether db already has an aliased instance of the same table with the same alias - and in that case, return that instance rather than creating a new one with with_alias.

That wrapper function should though not be necessary - I think, Table.with_alias should do that, so that this would succeed (if you need a unit test):

assertTrue(table.with_alias("alias") is table.with_alias("alias"))

@nursix
Copy link
Contributor Author

nursix commented Feb 28, 2018

Here's my workaround for the issue, which may help you to understand the problem:

    def get_aliased(table, alias):
        """
            Helper method to get a Table instance with alias; prevents
            re-instantiation of an already existing alias for the same
            table (which can otherwise lead to name collisions in PyDAL).

            @param table: the original table
            @param alias: the alias

            @return: the aliased Table instance
        """

        db = current.db

        if hasattr(db, alias):
            aliased = ogetattr(db, alias)
            if original_tablename(aliased) == original_tablename(table):
                return aliased

        aliased = table.with_alias(alias)
        if aliased._id.table != aliased:
            # Older PyDAL not setting _id attribute correctly
            aliased._id = aliased[table._id.name]

        return aliased

(Yeah, it contains another workaround for a problem in older PyDAL versions since we do not always know which version our software is deployed with - bit ignore that, just look at the first part here)

@nursix
Copy link
Contributor Author

nursix commented Feb 28, 2018

You could also consider this as an inconsistency.

...because, when I do this:

aliased = db.my_table.with_alias("my_aliased_table")

...then every subsequent db.my_aliased_table will return exactly the same object, but another db.my_table.with_alias("my_aliased_table") will return a new object.

I can see how this is debatable, of course - the new with_alias call could indeed be intended to create a new object even when called for the same table and with the same alias, and since there is no way to remove an aliased table from a DAL instance, this would be the only (non-intrusive) way to achieve that.

So, the question may be what is the more common case: re-using the same instance, or creating a new instance?

I for my part am struggling to imagine a case where I would want to override an alias with a new instance of the same table - but then again that's a very narrow perspective, perhaps. That's why I raised this issue rather than providing a fix.

@BuhtigithuB
Copy link
Contributor

BuhtigithuB commented Mar 1, 2018

No luck, my fix doesn't help at all... And you know what the issue is even worse than the duplicated object you talk about... The query doesn't even filter out the records of the alias table specified... My bad I had a typo...

This work in 2.14.6 :

db.auth_user.with_alias('user_table').username == 'USERNAME'  
db(query).select(db.auth_user.with_alias('user_table').username)
len(rows)
1

And don't with trunk and fall on :

---> 88                 raise ValueError('Name conflict in table list: %s' % key)
     89         # Merge
     90         big.update(small)

In this function :

def merge_tablemaps(*maplist):
    """Merge arguments into a single dict, check for name collisions.
    Arguments may be modified in the process."""
    ret = maplist[0]
    for item in maplist[1:]:
        if len(ret) > len(item):
            big, small = ret, item
        else:
            big, small = item, ret
        # Check for name collisions
        for key, val in small.items():
            if big.get(key, val) is not val:
                raise ValueError('Name conflict in table list: %s' % key)
        # Merge
        big.update(small)
        ret = big
    return ret

This function had been introduced with pyDAL 17.01

v16.11...v17.01#diff-114ce07f361177e0669ec9a374ef7d6aR71

Which occurs with this bunch of changes : #408

@BuhtigithuB
Copy link
Contributor

@nextghost, You be of some help here...

@BuhtigithuB
Copy link
Contributor

I had limited understanding what was all the underlying science of all these changes.

:D

@nextghost
Copy link
Contributor

Why does it have to be this way - wouldn't it be good enough to ensure that the DAL name of both tables is the same (i.e. that they refer to the same original table)? Do they really have to be exactly the same Python object?

It would be good enough. And it would also be much much MUCH easier to screw up that check. Which is why I've decided to not support that particular edge case and use simple and safe object identity.

Table.with_alias() returns new object every time because one of my goals for later was to eliminate alias caching on the DAL instance entirely. Relying on cached aliases is a great way to shoot yourself in the foot. You should create the table alias once and then pass it whereever you need to use it.

@nursix
Copy link
Contributor Author

nursix commented Mar 1, 2018

That's not a nice thing to say :/ because it's virtually impossible to do what you suggest.

I think it's time to abandon web2py/PyDAL altogether.

@nursix
Copy link
Contributor Author

nursix commented Mar 1, 2018

And to make the point: I don't see the slightest reason why combining two of the same (table.with_alias("alias") and table.with_alias("alias")) should crash a query. It doesn't make any sense - it's the same table, the same alias. Where exactly is the "simple and safe" in that?

I'm sorry, but getting brushed off like that doesn't make PyDAL any more attractive :( We've been using it for a long time (as long as it exists), but it's being developed away from us.

@BuhtigithuB
Copy link
Contributor

Cool down guys, there is surely a way...

@nursix
Copy link
Contributor Author

nursix commented Mar 1, 2018

And what would possibly the risk of caching aliases that refer to the same original table, anyway?

I can see how there is a risk if the same alias name is used for a /different/ table - but for the same table? Where would it make a difference?

We have long had a separation of filter construction and actual querying - and we can not possibly change all of that just because PyDAL is suddenly unable to combine two completely equal aliases in the same query.

@nursix
Copy link
Contributor Author

nursix commented Mar 1, 2018

@BuhtigithuB apparently the way is for us to rewrite our entire software and make it much more complicated in order to match PyDAL's quirks.

I'm sorry, but I just can't see how this change in PyDAL is helpful with anything. We're not stupid here, we've been doing this for a very long time - yet I get near-absurd suggestions to rewrite thousands of lines of code for something with very little perceivable value.

@BuhtigithuB
Copy link
Contributor

I too question myself about the value added of the nested select stuff if it render the DAL backward incompatible... But I don't think we should loose our temper... There is surely a way to solve the problem that we can not understand for the moment...

@nextghost
Copy link
Contributor

nextghost commented Mar 1, 2018

I don't see the slightest reason why combining two of the same (table.with_alias("alias") and table.with_alias("alias")) should crash a query. It doesn't make any sense - it's the same table, the same alias. Where exactly is the "simple and safe" in that?

Because in order to check that two different instances of Table point to the same database table, you have to check that:

  • The alias is identical
  • The table name is identical
  • The parent DAL object points to the same database (and you could easily ask for connection string comparison, which is a whole world of pain in itself, instead of simple object identity)

And someone could easily make some change in the future that will complicate this check even further.

I can see how there is a risk if the same alias name is used for a /different/ table - but for the same table?

That's exactly the risk: Redefining the alias to a different table between two uses that assume it's still the same table.

The easiest way you can upgrade your separation of filters and queries to the new API is to redefine the filters to a function that takes the table (or dict of table aliases) as an argument. Then just pass the function around instead of the original filter object and call it while building the final query with the right table instances for argument.

Anyway, I've quit Web2Py/PyDAL more than a year ago because (to put it mildly) I'm not happy with the core team's approach to code quality. If you want to revert all my contributions, I don't care in the least.

@BuhtigithuB
Copy link
Contributor

BuhtigithuB commented Mar 1, 2018

Here the merge commit :
b04b33e

@nursix
Copy link
Contributor Author

nursix commented Mar 1, 2018

I still struggle to see the problem, to be honest.

It seems you're merely looking at merge_tablemaps, not at Table.with_alias.

Since with_alias is an instance method of Table, you already have the DAL object (self._db), so all you need to check is whether the cached alias in that DAL object refers to the same dalname - which is trivial, as proven by my workaround.

It only ever becomes complicated when you abandon alias caching, because then indeed you would need to compare the entire object hierarchy - so that is why it seems absurd to do that.

It would make a lot more sense to move the alias caching into the Table object itself, so that you can skip the dalname check and only ever look at the alias. Then, whenever you call Table.with_alias with an alias name for which there already is a cached instance in /that table/, then return that same instance.

Your proposal is still for us to rewrite our code rather than fixing this newly introduced issue in PyDAL? This isn't really the most helpful advice, but if you say there is no other way and you're not going to fix this, then okay, thanks for your time.

@nursix nursix closed this as completed Mar 1, 2018
@nursix
Copy link
Contributor Author

nursix commented Mar 1, 2018

Yeah - and thanks for clarifying that you don't care about us.

@BuhtigithuB
Copy link
Contributor

@nursix Please leave this issue open...

@nursix
Copy link
Contributor Author

nursix commented Mar 1, 2018

If you wish...sorry, I understood it as resolved with "won't fix", so I closed it.

@nursix nursix reopened this Mar 1, 2018
@BuhtigithuB
Copy link
Contributor

To me it a regression as it break backward compatibility, I won't see why we would allow a new feature break backward compatibility... I am not qualified (and don't have the time to wrap my head around this issue), so I prefer let others look in to that. I had documented the issue the best I can and thank you for your collaboration too. Hope @mdipierro and @gi0baro will pass by here soon and help clear that out.

Richard

@nextghost
Copy link
Contributor

I said I've quit. I've explained my reasons for writing the code the way I did and now it's up to the core team to deal with this issue as they see fit.

@BuhtigithuB
Copy link
Contributor

No prob thank you @nextghost this the whole point to leave the issue open...

:D

@leonelcamara
Copy link
Contributor

It would make a lot more sense to move the alias caching into the Table object itself, so that you can skip the dalname check and only ever look at the alias. Then, whenever you call Table.with_alias with an alias name for which there already is a cached instance in /that table/, then return that same instance.

This makes sense to me as well, and would be my preference if it doesn't cause memory leaks/thread safety problems. Either way, this is definitely a bug, you do not break people's applications like this.

@nursix
Copy link
Contributor Author

nursix commented Mar 1, 2018

Ahm - please take my proposal with a grain of salt. Mind my app developer perspective ;)

I'm not saying it's a bug - it's rather somewhat implausible from the app perspective.

If I create two Table.with_alias() instances from the same original table with the same alias, then it isn't obvious why they should collide in a query.

To me as the app developer, they both constitute one and the same TABLE AS expression, and hence I don't see why the app should take care that they also are the same Python objects. That's a bit counter-intuitive, and not really obvious from the documentation.

If it is a bug, like you say, then it should be fixed in some way - but I don't claim that my proposal is necessarily the correct way to fix it, it's just how I could imagine it to work.

But if it is a feature, as has also been suggested here, then I would be interested to learn what the requirement behind it might be.

@leonelcamara
Copy link
Contributor

I'm not saying it's a bug - it's rather somewhat implausible from the app perspective.

It's definitely a bug, I can sympathize with nextghost wanting to simplify the way the DAL works to make it more manageable but you cannot break applications in this way. It's not like your application is using some kind of private methods or monkey patching a class, you're just calling regular functions documented in the book. Everything in software could be made simpler and more elegant if no one used it.

@BuhtigithuB
Copy link
Contributor

I see it a bug too... DAL should allow you to construct SQL without having to know all the complexity behind it. Aliasing in SQL is something used all the time to save a few key stroke... In web2py it has not the same usage as it not save any key stroke at all, but it should be supported and we shouldn't break backward compatibility without notice, so it definitely a bug as it use to work before and it something you expect it gonna work logically if you start to user aliases.

@BuhtigithuB
Copy link
Contributor

I will have try ASAP...

@BuhtigithuB
Copy link
Contributor

@leonelcamara, I test it, seems to solve the issue... Could it has any drawback??

@nursix, can you try with your app if it solves all the issues you were having?

Just apply the change here : leonelcamara@62d0948#diff-68220c912f02a03fd8bce58415479334R1053

Over : gluon/packages/pydal/objects.py

Think to reboot web2py...

:)

@nursix
Copy link
Contributor Author

nursix commented Mar 7, 2018

This works (is equivalent to my workaround), but having the standard case in the "except" branch seems a bit expensive.

It's not my call to make - but wouldn't it be better to store aliases in the Table instance (e.g. as self._aliases) rather than in the DAL, so that a simple self._aliases.get(alias) would do?

@nursix
Copy link
Contributor Author

nursix commented Mar 7, 2018

Btw - I'm also wondering that because it seems to me that currently, with_alias() with an existing tablename could accidentally detach (i.e. overwrite) a genuine Table instance that happens to have a table name equal to the chosen alias.

I'm not a PyDAL expert developer, but /this/ seems even more serious to me than the original issue here.

@nursix
Copy link
Contributor Author

nursix commented Mar 7, 2018

I mean this:

>>> db.pr_person._rname
'`pr_person`'
>>> aliased = db.org_organisation.with_alias("pr_person")
>>> db.pr_person._rname
'`org_organisation`'

i.e. a (poorly chosen) alias for the org_organisation table overwrites the pr_person Table, and no way to restore the original.

@leonelcamara
Copy link
Contributor

@nursix I could easily change this caching to be done at the table level, the point is that it was already being made at the DAL level, so the shadowing was already being made, I assumed it was for some valid reason, so I simply took advantage of it. Perhaps, @mdipierro or @gi0baro can explain why are the aliases being stored in self._db, otherwise I will simply have to read more of the DAL code and probably start commenting it.

but having the standard case in the "except" branch seems a bit expensive.

True, may be worth it to change this to an "if" statement as the try/except only becomes an optimization if you call the same with_alias like 10 times or more and I doubt that's common.

@mdipierro
Copy link
Contributor

mdipierro commented Mar 7, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants