New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another joinWith() approach #1618

Closed
creocoder opened this Issue Dec 25, 2013 · 25 comments

Comments

Projects
None yet
9 participants
@creocoder
Contributor

creocoder commented Dec 25, 2013

Suggest joinWith() removing and making one with() method with following signature:

with($with, $eagerLoading = true, $joinType = 'LEFT OUTER JOIN', $together = false);

to make API more logical.

@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Dec 25, 2013

Member

with() is an interface method shared by all DB solutions. However, not all DB solutions support join. So it's better we separate joinWith() from with().

Member

qiangxue commented Dec 25, 2013

with() is an interface method shared by all DB solutions. However, not all DB solutions support join. So it's better we separate joinWith() from with().

@qiangxue qiangxue closed this Dec 25, 2013

@creocoder

This comment has been minimized.

Show comment
Hide comment
@creocoder

creocoder Dec 25, 2013

Contributor

So as i see the aspect of separation is:

However, not all DB solutions support join.

Ok, fine. Such DB solutions can ignore $together param for example. What other advantage of separating?

Contributor

creocoder commented Dec 25, 2013

So as i see the aspect of separation is:

However, not all DB solutions support join.

Ok, fine. Such DB solutions can ignore $together param for example. What other advantage of separating?

@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Dec 25, 2013

Member

The advantage of separation is that it is clearer - when you see the term joinWith(), your immediate response is that it will create a JOIN SQL and you should be extra careful. Using a $together parameter to indicate this is not as clear and could be easily overlooked.

Member

qiangxue commented Dec 25, 2013

The advantage of separation is that it is clearer - when you see the term joinWith(), your immediate response is that it will create a JOIN SQL and you should be extra careful. Using a $together parameter to indicate this is not as clear and could be easily overlooked.

@creocoder

This comment has been minimized.

Show comment
Hide comment
@creocoder

creocoder Dec 25, 2013

Contributor

@qiangxue Ok, anyway joinWith() looks for me like a fast hack to close #1581 issue. Dont you think issue was born by eliminating "together" feature from relations? How about more radical approach? I mean something like Yii 1 approach with dinamic relation options.

Contributor

creocoder commented Dec 25, 2013

@qiangxue Ok, anyway joinWith() looks for me like a fast hack to close #1581 issue. Dont you think issue was born by eliminating "together" feature from relations? How about more radical approach? I mean something like Yii 1 approach with dinamic relation options.

@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Dec 25, 2013

Member

We don't want to go for the 1.1 approach because it involves complicated table/column aliasing, which is not trivial to implement and is also tricky to use. The issue pops up because people are accustomed to the 1.1 approach which mixes the concept of data fetching and data filtering. In practice, it's very possible that you join with some relations while eager loading some other relations. Data fetching and filtering should be separated as designed in 2.0. The joinWith() implementation isn't a hack to #1581. I consider it as an important complement to with().

Member

qiangxue commented Dec 25, 2013

We don't want to go for the 1.1 approach because it involves complicated table/column aliasing, which is not trivial to implement and is also tricky to use. The issue pops up because people are accustomed to the 1.1 approach which mixes the concept of data fetching and data filtering. In practice, it's very possible that you join with some relations while eager loading some other relations. Data fetching and filtering should be separated as designed in 2.0. The joinWith() implementation isn't a hack to #1581. I consider it as an important complement to with().

@slavcodev

This comment has been minimized.

Show comment
Hide comment
@slavcodev

slavcodev Dec 25, 2013

Contributor

I did not understand the need joinWith so this post does not make sense

Contributor

slavcodev commented Dec 25, 2013

I did not understand the need joinWith so this post does not make sense

@Ragazzo

This comment has been minimized.

Show comment
Hide comment
@Ragazzo

Ragazzo Dec 25, 2013

Contributor

@slavcodev lazyWith and eagerWith is more confusing. I think that people who knows that Yii2 AR first select all id's than select related AR by this ids would find more confusing eagerWith than joinWith, because second one is exactly pointing to what sql condition will be executed.

Contributor

Ragazzo commented Dec 25, 2013

@slavcodev lazyWith and eagerWith is more confusing. I think that people who knows that Yii2 AR first select all id's than select related AR by this ids would find more confusing eagerWith than joinWith, because second one is exactly pointing to what sql condition will be executed.

@creocoder

This comment has been minimized.

Show comment
Hide comment
@creocoder

creocoder Dec 25, 2013

Contributor

@qiangxue So the joinWith() role is strictly practical reasons, right? To simplify joins with relations.

Contributor

creocoder commented Dec 25, 2013

@qiangxue So the joinWith() role is strictly practical reasons, right? To simplify joins with relations.

@nineinchnick

This comment has been minimized.

Show comment
Hide comment
@nineinchnick

nineinchnick Dec 25, 2013

Contributor

I agree that this should be a separate method but I don't like @qiangxue explanation. You shouldn't state that data fetching and filtering should be separated. I think most database system, SQL or not, are faster in performing those tasks than PHP. Even NoSQL systems are advertised as providing an efficient map-reduce filtering method.

I think the most important cons are:

  • creating duplicate values in the result set
  • making the result set bigger
  • could create a large temporary data set, sometimes a subquery is faster than a join

The pros are:

  • easier to build conditions as one object
  • allow to use aggregate functions
  • very fast processing and little overhead (no primary key value lists, single query)
  • database planner can optimize the execution

I'm not listing aliasing as a con. Personally I never had trouble with that. Even if it's hard to implement it doesn't state anything about good design.

I like when three types of data loading are defined:

  • lazy loading
  • eager loading using multiple queries
  • eager loading using single query

Actually the terms lazy and eager should be well recognized by devs, as it is common to a lot of ORMs. Now introducing them to with and together creates the confusion. I'd say that eagerWith would be better than joinWith, even gramatically.

Recently I extensively audited the ActiveFinder from Yii 1 to add support for MANY_MANY through relations. It's complex, but complete. It's a shame to throw it away. I find it very useful and I build some tools around it.

To conclude, I think it's a very important feature and if it's not done well it can drive a lot devs away from Yii 2.

PS. Actually, the best solution is to use subqueries instead of separate queries and passing list of PK values. That allows cross-referencing tables with some limitations. This is just a crazy idea, I'm going to think about it more in near future when browsing the code.

Contributor

nineinchnick commented Dec 25, 2013

I agree that this should be a separate method but I don't like @qiangxue explanation. You shouldn't state that data fetching and filtering should be separated. I think most database system, SQL or not, are faster in performing those tasks than PHP. Even NoSQL systems are advertised as providing an efficient map-reduce filtering method.

I think the most important cons are:

  • creating duplicate values in the result set
  • making the result set bigger
  • could create a large temporary data set, sometimes a subquery is faster than a join

The pros are:

  • easier to build conditions as one object
  • allow to use aggregate functions
  • very fast processing and little overhead (no primary key value lists, single query)
  • database planner can optimize the execution

I'm not listing aliasing as a con. Personally I never had trouble with that. Even if it's hard to implement it doesn't state anything about good design.

I like when three types of data loading are defined:

  • lazy loading
  • eager loading using multiple queries
  • eager loading using single query

Actually the terms lazy and eager should be well recognized by devs, as it is common to a lot of ORMs. Now introducing them to with and together creates the confusion. I'd say that eagerWith would be better than joinWith, even gramatically.

Recently I extensively audited the ActiveFinder from Yii 1 to add support for MANY_MANY through relations. It's complex, but complete. It's a shame to throw it away. I find it very useful and I build some tools around it.

To conclude, I think it's a very important feature and if it's not done well it can drive a lot devs away from Yii 2.

PS. Actually, the best solution is to use subqueries instead of separate queries and passing list of PK values. That allows cross-referencing tables with some limitations. This is just a crazy idea, I'm going to think about it more in near future when browsing the code.

@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Dec 25, 2013

Member

@creocoder Yes.

@nineinchnick The biggest drawback of AR in 2.0 is the potential performance issue for eager loading (even this may not be very true). Other than this, it has all the pros you listed for AR 1.1.

As you may have found out, the current AR 2.0 design allows it to be supported by NoSQL and other non-relational DBMS, and it even supports relation between ARs from different types of DBMS.

Regarding the term "eager" and "lazy", the rule for AR 2.0 is very simple: if you use with() or joinWith(), it is eager loading; otherwise it is lazy loading. For this reason, the terms eagerWith() and lazyWith() are not any more expressive. On the contrary, they may bring confusion (e.g. what does lazyWith() do?)

Member

qiangxue commented Dec 25, 2013

@creocoder Yes.

@nineinchnick The biggest drawback of AR in 2.0 is the potential performance issue for eager loading (even this may not be very true). Other than this, it has all the pros you listed for AR 1.1.

As you may have found out, the current AR 2.0 design allows it to be supported by NoSQL and other non-relational DBMS, and it even supports relation between ARs from different types of DBMS.

Regarding the term "eager" and "lazy", the rule for AR 2.0 is very simple: if you use with() or joinWith(), it is eager loading; otherwise it is lazy loading. For this reason, the terms eagerWith() and lazyWith() are not any more expressive. On the contrary, they may bring confusion (e.g. what does lazyWith() do?)

@nineinchnick

This comment has been minimized.

Show comment
Hide comment
@nineinchnick

nineinchnick Dec 25, 2013

Contributor

You're right, it's just a question what terms should we use for eager and eager-er, that is the two eager loading methods I mentioned.

I'd just like to address that the current method of eager loading by passing a list of PK values to a second query feels dirty. I will look into it and try to find a solution.

Contributor

nineinchnick commented Dec 25, 2013

You're right, it's just a question what terms should we use for eager and eager-er, that is the two eager loading methods I mentioned.

I'd just like to address that the current method of eager loading by passing a list of PK values to a second query feels dirty. I will look into it and try to find a solution.

@creocoder

This comment has been minimized.

Show comment
Hide comment
@creocoder

creocoder Dec 25, 2013

Contributor

@nineinchnick

I'd just like to address that the current method of eager loading by passing a list of PK values to a second query feels dirty. I will look into it and try to find a solution.

You confuse eager loading with query strategy. Eager loading is wide term which related not only to ORMs. Both with() and joinWith() use eager loading methodology, but different query strategies. So your comment is related to query strategy of with() and not related to eager loading methodology at all.

Contributor

creocoder commented Dec 25, 2013

@nineinchnick

I'd just like to address that the current method of eager loading by passing a list of PK values to a second query feels dirty. I will look into it and try to find a solution.

You confuse eager loading with query strategy. Eager loading is wide term which related not only to ORMs. Both with() and joinWith() use eager loading methodology, but different query strategies. So your comment is related to query strategy of with() and not related to eager loading methodology at all.

@creocoder

This comment has been minimized.

Show comment
Hide comment
@creocoder

creocoder Dec 25, 2013

Contributor

@nineinchnick

Actually, the best solution is to use subqueries instead of separate queries and passing list of PK values.

Not all DBs support subqueries. Also there is some troubles with subqueries in some legacy mysql versions for example.

Contributor

creocoder commented Dec 25, 2013

@nineinchnick

Actually, the best solution is to use subqueries instead of separate queries and passing list of PK values.

Not all DBs support subqueries. Also there is some troubles with subqueries in some legacy mysql versions for example.

@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Dec 25, 2013

Member

I'd just like to address that the current method of eager loading by passing a list of PK values to a second query feels dirty.

We had the similar thought initially when we were redesigning AR. But our tests and investigation found that it is not that bad. It actually has many benefits, including the support for relational queries for NoSQL DBs. Performance wise, there is no proof that it would be worse than join (it may be faster in fact.) The only limitation is when you have too many PKs returned, the queries for related records could be too long to be accepted by DBMS. However, this is very uncommon.

Member

qiangxue commented Dec 25, 2013

I'd just like to address that the current method of eager loading by passing a list of PK values to a second query feels dirty.

We had the similar thought initially when we were redesigning AR. But our tests and investigation found that it is not that bad. It actually has many benefits, including the support for relational queries for NoSQL DBs. Performance wise, there is no proof that it would be worse than join (it may be faster in fact.) The only limitation is when you have too many PKs returned, the queries for related records could be too long to be accepted by DBMS. However, this is very uncommon.

@creocoder

This comment has been minimized.

Show comment
Hide comment
@creocoder

creocoder Jan 8, 2014

Contributor

@qiangxue Another idea for this:

$items = MyModel::find()
    ->with([
        'relation1' => function ($query, $options) {
            $options->together = true;
            $options->joinType = 'INNER JOIN';
        },
        'relation3' => function ($query, $options) {
            $query->andWhere(...);
            $options->together = true;
        },
        'relation4',
        'relation5',
    ])

there is no proof that it would be worse than join

I remember times when CActiveFinder::together become true by default with opposite argument :) So to be honest the real reason to avoid joins IS NOT perfomance. Reason is easy caching and some NoSQL dbs troubles. If someone will not use caching he will garantee loose perfomance with NoJOIN approach. Can prepare proofs to show NoJOIN approach without caching VS JOIN approach without caching.

Contributor

creocoder commented Jan 8, 2014

@qiangxue Another idea for this:

$items = MyModel::find()
    ->with([
        'relation1' => function ($query, $options) {
            $options->together = true;
            $options->joinType = 'INNER JOIN';
        },
        'relation3' => function ($query, $options) {
            $query->andWhere(...);
            $options->together = true;
        },
        'relation4',
        'relation5',
    ])

there is no proof that it would be worse than join

I remember times when CActiveFinder::together become true by default with opposite argument :) So to be honest the real reason to avoid joins IS NOT perfomance. Reason is easy caching and some NoSQL dbs troubles. If someone will not use caching he will garantee loose perfomance with NoJOIN approach. Can prepare proofs to show NoJOIN approach without caching VS JOIN approach without caching.

@creocoder

This comment has been minimized.

Show comment
Hide comment
@creocoder

creocoder Jan 8, 2014

Contributor

@qiangxue And little more here:

However, not all DB solutions support join.

Fine, lets name this more abstractly as together. For RDBS its JOIN. I'm sure all NoSQL db solutions have possibility to make queries like that. Am i wrong? If not, where is the problem and why query strategy was changed to:

potential performance issue

strategy ?

Contributor

creocoder commented Jan 8, 2014

@qiangxue And little more here:

However, not all DB solutions support join.

Fine, lets name this more abstractly as together. For RDBS its JOIN. I'm sure all NoSQL db solutions have possibility to make queries like that. Am i wrong? If not, where is the problem and why query strategy was changed to:

potential performance issue

strategy ?

@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Jan 8, 2014

Member

@creocoder I'm interested to see your profiling results which compares non-join and join queries.

I have pretty much explained why we chose the current design and method names in my previous replies. I think we should spend time repeating the same debate again.

Member

qiangxue commented Jan 8, 2014

@creocoder I'm interested to see your profiling results which compares non-join and join queries.

I have pretty much explained why we chose the current design and method names in my previous replies. I think we should spend time repeating the same debate again.

@creocoder

This comment has been minimized.

Show comment
Hide comment
@creocoder

creocoder Jan 8, 2014

Contributor

@qiangxue Ok will create profiling tests then with different environments: local sql server and remote sql server.

Contributor

creocoder commented Jan 8, 2014

@qiangxue Ok will create profiling tests then with different environments: local sql server and remote sql server.

@qiangxue

This comment has been minimized.

Show comment
Hide comment
@qiangxue

qiangxue Jan 8, 2014

Member

Cool. Make sure you disable query cache when testing: http://dev.mysql.com/doc/refman/5.1/en/query-cache.html

Member

qiangxue commented Jan 8, 2014

Cool. Make sure you disable query cache when testing: http://dev.mysql.com/doc/refman/5.1/en/query-cache.html

@cebe

This comment has been minimized.

Show comment
Hide comment
@cebe

cebe Jan 9, 2014

Member

Fine, lets name this more abstractly as together. For RDBS its JOIN. I'm sure all NoSQL db solutions have possibility to make queries like that. Am i wrong?

No, most of them do not have the possibility to retrieve data in a join-like query. Querys do return records of one type in most cases and if it is multi type then the way the records are fetched is different from relation way but like cross type search or something like that. redis does not have it because it is key value only. elasticsearch can only fetch parent-child relation in one query. Not sure about Mongo.
In general you can not assume JOIN to be present in NoSQL DBMS Most of them have different goals and join is the most tricky part in querying data as far as I see so they leave it out as it may not easily be distributeable over a cluster for example.

Member

cebe commented Jan 9, 2014

Fine, lets name this more abstractly as together. For RDBS its JOIN. I'm sure all NoSQL db solutions have possibility to make queries like that. Am i wrong?

No, most of them do not have the possibility to retrieve data in a join-like query. Querys do return records of one type in most cases and if it is multi type then the way the records are fetched is different from relation way but like cross type search or something like that. redis does not have it because it is key value only. elasticsearch can only fetch parent-child relation in one query. Not sure about Mongo.
In general you can not assume JOIN to be present in NoSQL DBMS Most of them have different goals and join is the most tricky part in querying data as far as I see so they leave it out as it may not easily be distributeable over a cluster for example.

@konapaz

This comment has been minimized.

Show comment
Hide comment
@konapaz

konapaz Sep 18, 2016

Is there any way to load eager data "with" or 'joinWith' for ORM by another database ?
I use Yii 2.0.9

For example
$companies = Company::find()->joinWith('city')->all();

In model Company I have

public function getCity() {
        return $this->hasOne(\common\models\City::className(), ['id' => 'city_id']);
   }

City is a model that has already configured to another database using:

 public static function getDb() {
        return Yii::$app->get('db_another_database');
    }

But is not work, the generated query does not include the name of another_database.city_table.id in condition LEFT JOIN, so a database exception occurs

I tried

$companies = Company::find()->leftJoin('another_database.city_table AS city' , 'company.city_id=city.id')->all();

But although the generated query is corrected the data of city for each company loaded by lazy method

konapaz commented Sep 18, 2016

Is there any way to load eager data "with" or 'joinWith' for ORM by another database ?
I use Yii 2.0.9

For example
$companies = Company::find()->joinWith('city')->all();

In model Company I have

public function getCity() {
        return $this->hasOne(\common\models\City::className(), ['id' => 'city_id']);
   }

City is a model that has already configured to another database using:

 public static function getDb() {
        return Yii::$app->get('db_another_database');
    }

But is not work, the generated query does not include the name of another_database.city_table.id in condition LEFT JOIN, so a database exception occurs

I tried

$companies = Company::find()->leftJoin('another_database.city_table AS city' , 'company.city_id=city.id')->all();

But although the generated query is corrected the data of city for each company loaded by lazy method

@SilverFire

This comment has been minimized.

Show comment
Hide comment
Member

SilverFire commented Sep 19, 2016

@konapaz Duplicates #5470

@konapaz

This comment has been minimized.

Show comment
Hide comment
@konapaz

konapaz Sep 19, 2016

So the best solution I found is

$companies = Company::find()
            ->joinWith(['city' => function($query) { 
                   return $query->from('another_database.city_table')
            }])->all();

So what is your opinion ? @SilverFire

konapaz commented Sep 19, 2016

So the best solution I found is

$companies = Company::find()
            ->joinWith(['city' => function($query) { 
                   return $query->from('another_database.city_table')
            }])->all();

So what is your opinion ? @SilverFire

@SilverFire

This comment has been minimized.

Show comment
Hide comment
@SilverFire

SilverFire Sep 19, 2016

Member

If you want to JOIN on plain SQL - yes, it's the best solution.
If eager loading in multiple queries is fine for you - use Company::find()->with('city')->all();

Member

SilverFire commented Sep 19, 2016

If you want to JOIN on plain SQL - yes, it's the best solution.
If eager loading in multiple queries is fine for you - use Company::find()->with('city')->all();

@keeperevil

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment