Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for virtual fields in AR #8316

Closed
SamMousa opened this issue May 7, 2015 · 40 comments
Closed

Support for virtual fields in AR #8316

SamMousa opened this issue May 7, 2015 · 40 comments

Comments

@SamMousa
Copy link
Contributor

SamMousa commented May 7, 2015

Has the Yii team ever considered implementing support for virtual fields in AR?

CakePHP has it here: http://book.cakephp.org/2.0/en/models/virtual-fields.html

We have implemented it for Yii2 here: https://github.com/MarketFlow/yii2-virtual-fields
(Documentation is there, the actual files will follow in a few mins / hours).

Advantage built-in of virtual field support over implementing stuff like "getPostCount()" in each model is that virtual fields can be greedily loaded, greatly reducing the number of queries required.

@klimov-paul
Copy link
Member

@fernandezekiel
Copy link
Contributor

@klimov-paul maybe he's talking about a field defined by SQLs not by php code... something similar to statistical queries in Yii1... although in Yii2 i just do this by adding a custom property + Query::select()

@klimov-paul
Copy link
Member

As far as I understand, @SamMousa proposes creation of model virtual property, which may be populated either by eager loading in case they are mentioned at SELECT or using lazy loading, when record has been fetched without it.

It seems this could be done in following way:

class Customer extends ActiveRecord
{
    private $_orderCount;

    public function setOrderCount($value)
    {
        $this->_orderCount = $value;
    }

    public function getOrderCount()
    {
        if ($this->_orderCount === null) {
            $this->_orderCount = Order::find()->where(['user_id' => $this->id])->count();
        }
        return $this->_orderCount;
    }
}

@SamMousa
Copy link
Contributor Author

SamMousa commented May 7, 2015

@klimov-paul, you are right about my goal, however I do not see how your example uses greedy loading.

The problem with lazy loading is that if I want to get a list of Authors with their postCount, it will do n+1 queries. My proposal reduces this to 1 query.

@klimov-paul
Copy link
Member

None prevents you to write eager loading for virtual property:

$customers = Customer::find()->select(['*', 'COUNT(orders.id) AS orderCount'])->joinWith('orders')->all();
foreach ($customers as $customer) {
    echo $customers->orderCount; // no extra query
}

$customer = Customer::find()->one();
echo $customer->orderCount; // + 1 query.

@SamMousa
Copy link
Contributor Author

SamMousa commented May 7, 2015

Yes I'm not doing stuff that is "impossible" however, you must agree your example is not DRY and will not support lazy loading.

The point of this proposal is that you can implement both if you want and use them everywhere like $customer->orderCount.

Also I think your example might actually not work since populateRecord will not set the orderCount property unless I add it to the model as well. So that means I have to edit the code in a lot of places.

-- note that nothing prevents me from defining my relations all over the code as well, nonetheless I think we all agree that defining model related stuff in the model is more DRY than doing it all over the place?
--2: I'm on IRC if you'd like to discuss this more in-depth, i'll then update this proposal with any remarks.

@klimov-paul
Copy link
Member

your example, is not DRY and will not support lazy loading.

Why? getOrderCount() will do lazy loading in case $_orderCount is not set ealier.

The point of this proposal is that you can implement both if you want and use them everywhere like

You can. If you don't like writing select() and joinWith() at several places you can compose this code into a ActiveQuery scope and then use it if needed.

your example might actually not work since populateRecord will not set the orderCount property unless I add it to the model as well.

It should set it, as far as I see:

public static function populateRecord($record, $row)
    {
        $columns = array_flip($record->attributes());
        foreach ($row as $name => $value) {
            if (isset($columns[$name])) {
                $record->_attributes[$name] = $value;
            } elseif ($record->canSetProperty($name)) { // we have setter, so we can set `orderCount`
                $record->$name = $value; // here `orderCount` will be set.
            }
        }
        $record->_oldAttributes = $record->_attributes;
    }

@SamMousa
Copy link
Contributor Author

SamMousa commented May 7, 2015

Ah, your 2nd example is added to the first. =)

So then I have to create a scope for each ActiveRecord to add custom fields, forcing me to define the same "relation" in 4 places, lazily in the model, greedily in the scope.

Places:

  1. Private / protected model property.
  2. Getter in model.
  3. Setter in model.
  4. Scope in query object.

Not to mention in a lot of cases the query object would have to be defined just to accomodate virtual fields.

I do not see how that is a better approach than an approach which allows you to simply define it in one place where it will always work.

@mikehaertl
Copy link
Contributor

@klimov-paul I tend to agree to @SamMousa. It could be very convenient to list SQL expressions for those virtual attributes. Using scopes is pretty cumbersome (and not sure if it works easily if you have more than one of those attributes anyway).

So having something like

public function virtuals()
{
    return [
        'cssClass' => "IF(x.value IS NULL, '',x.value)",
        'priceCut' => "IF(x.old_price>0, FLOOR(100*(1-x.price/x.old_price)), 0)",
    ];
}

in ActiveRecord could be very convenient. This would move some calculations to the DB (making these fields orderable) and avoid that you have to write too many getters / scopes.

@SamMousa
Copy link
Contributor Author

SamMousa commented May 7, 2015

Btw, here is the syntax i'm proposing:

public static function virtualFields() {
  return [
    'postCount' => [
      'lazy' => function($model) { return $this->getPosts()->count; }
      'greedy' => Post::model()->limit(1)->select('count(*)')->condition('author_id = author.id')
    ]
  ]; 
}

I think it's better if it's static since during greedy loading you don't have a model instance, (it will also be less error prone).
Lazy loading can be ommitted and it will throw an exception if you use it lazily.

@klimov-paul
Copy link
Member

forcing me to define the same "relation" in 4 places

??? - I can't see any relation definition involved here.

query object would have to be defined just to accomodate virtual fields.

I don't understand. If you with to use a scopes, which is common practice, you have to define a special Query class for your ActiveRecord. I can see no harm, if you need to do so for the virtual properties as well.

I do not see how that is a better approach than an approach which allows you to simply define it in one place where it will always work.

In return: I do not see why we should add an extra mechanism for class properties definition, while this can be easy done using existing one: setter/getter.
Creating custom setter/getter grands more flexibility: you can decide if you wish or not throw an exception on attempt to access not filled virtual property (no support for lazy load) and so on.

So having something like ... in ActiveRecord could be very convenient.

Yes, it will - but I am afraid this is not that simple. If you use an aggragation to query orderCount, you will need to add a GROUP BY statement, but your virtual property may be composed not only from aggragation and may not require a GROUP BY. There is no way some 'pretty' syntax be composed, just look at @SamMousa implementation: it is a getter composed into anunymous function and scope definition.

@SamMousa
Copy link
Contributor Author

SamMousa commented May 7, 2015

There are quotes over "relation" for a reason. I'm not talking about an AR relation but about the fact that the number of posts for an author is related to both Author and Post.

The whole point of these virtual fields is NOT to use joins, removing the need for group by.
Adding joins and group by can definitely break stuff.

Suppose we extend the example to say that Authors not only have Posts, but they also have Followers.
Now I need one overview to get all Authors with the number of posts and the number of followers. Clearly this cannot (easily) be done using joins and group by.
Instead using subqueries in the SELECT is intuitive and in most cases results in an extremely efficient execution plan.

I'm not sure on your last argument, definitely a syntax that resembles stuff you guys use in Yii for things like:

  • relations,
  • validation rules,
  • attribute labels
  • etc..

Is better than requiring changes in 4 places to implement 1 virtual field. I'll reiterate the places:

  1. Scope class
  2. Private / protected model property $_postCount;
  3. Getter in model,
  4. Setter in model.

And yes, my implementation for lazy loading is a getter composed into an anonymous function. And yes, the greedy loading can be defined in a scope. But isn't the whole purpose of the AR abstraction to make things DRY?

@lynicidn
Copy link
Contributor

lynicidn commented May 7, 2015

virtual attributes it group scopes for query? - create scope.
problem with static relation (eager) it not solve

@SamMousa SamMousa closed this as completed May 7, 2015
@SamMousa SamMousa reopened this May 7, 2015
@SamMousa
Copy link
Contributor Author

SamMousa commented May 9, 2015

Any of the other team members have some feedback on this?

@laszlovl
Copy link
Contributor

A way to eagerly load various "scalar" data such as what's generated by statistical queries would be very welcome.

It would be ideal though if we could load them in with() like regular relations, for example:

$baskets = Basket::find()->with(['customer', 'customer.orderCount'])->all();
foreach ($baskets as $basket) {
    echo $basket->customer->orderCount; // without additional query
}

I think this isn't possible with either of these two proposals?

@SamMousa
Copy link
Contributor Author

It actually is, i chose to implement it via the select; but doing it via
with would be just as easy and not change the syntax for defining it.
On May 10, 2015 22:05, "Laszlo" notifications@github.com wrote:

A way to eagerly load various "scalar" data such as what's generated by
statistical queries would be very welcome.

It would be ideal though if we could load them in with() like regular
relations, for example:

$baskets = Basket::find()->with(['customer', 'customer.orderCount']);foreach ($baskets as $basket) { echo $basket->customer->orderCount; // without additional query}

I think this isn't possible with either of these two proposals?


Reply to this email directly or view it on GitHub
#8316 (comment).

@laszlovl
Copy link
Contributor

@SamMousa I didn't check out your extension in detail yet, but if it is indeed possible, I would say it's a welcome change: it would make the orderCount "property" available from any query, instead of only when directly querying its parent class.

@SamMousa
Copy link
Contributor Author

Well, it is not a normal relation and since this is about adding subqueries in the select clause I'm not sure it will work if you query chained relations. I explicitly don't want to reintroduce stat relations; I'm assuming the yii team decided to omit it for q reason

@SamMousa
Copy link
Contributor Author

Thinking some more about it, @laszlovl.
It is definitely useful to be able to specify virtual fields even when the virtual field is not in the primary model. I'm unsure if this should be combined with relations though.
Instead we could add syntax for it; that way people don't get confused AND we have no backward compatiblity issues.

Currently my syntax using select would allow for this:

    Post::find()->with(['Author' => function($q) {
        $q->addSelect('postCount');
    });

Obviously this is a bit verbose and we could add some syntactic sugar.

    Author::find()->withField('postCount');
    Post::find()->with('author')->withField('author.postCount');

This clearly specifies that we are getting a virtual field; it is also easy to parse for active query (since it needs to add the virtual field definition to the select clause).

Regarding syntax I'm just making suggestions; I think this functionality should be added to the core even if the dev team decides on different syntax or integration with existing syntax (select or with for example).

Currently the only argument against is that "it can be done without", however that argument is obviously void, by definition a framework does not do anything PHP cannot do by itself, the goal is to add structure and avoid repetition which this certainly does!

@laszlovl
Copy link
Contributor

I've been playing with eagerly loading a virtual count attribute in the primary query like in the examples listed above. That didn't work out very well though, as the GROUP BY necessary for the count calculation would mess up the results in the primary query. Perhaps it would be safer and more consistent if the virtual attribute were loaded in a separate query, just like relations are.

That led me to this proof of concept:

class User {
    public $count;

    public function getPostCount()
    {
        return $this->hasOne(self::class, ['id' => 'id'])
            ->select('User.id, COUNT(Post.id) as count')->joinWith('posts', false)->groupBy('User.id');
    }

    public function getPosts()
    {
        return $this->hasMany(Post::class, ['user_id' => 'id']);
    }
}

It's not beautiful, but it is fast and DRY: the same definition can be used for both lazy and eager loading:

$baskets = Basket::find()->with(['user.postCount'])->all();
// SELECT * FROM `Basket`;
// SELECT * FROM `User` WHERE `User`.`id` IN (1, 2);
// SELECT `User`.`id`, COUNT(Post.id) as count FROM `User` LEFT JOIN `Post` ON `User`.`id` = `Post`.`user_id` WHERE `User`.`id` IN (1, 2) GROUP BY `User`.`id`

foreach ($baskets as $basket) {
    echo $basket->user->postCount->count;
}

$user = User::findOne(1);
echo $user->postCount->count;
// SELECT `User`.`id`, COUNT(Post.id) as count FROM `User` LEFT JOIN `Post` ON `User`.`id` = `Post`.`user_id` WHERE `User`.`id`=1 GROUP BY `User`.`id`

Perhaps if ActiveQuery would decouple the class used to load relations from the class that's instanciated, we could clean it up a bit. Instead of $user->postCount returning a User instance, one could then make it return a VirtualPropertyContainer or something, and even override __get to directly return that container's value instead of the container itself.

@SamMousa
Copy link
Contributor Author

@laszlovl If you had read my implementation / documentation linked in the first post you'd see that i'm not doing any joins on purpose. So don't worry about that.

The whole idea behind virtual fields is to add subqueries in the SELECT clause. So no joins no group by, nothing. This will always work and assuming you have the correct indexes it will be performant as well.
Especially when using pagination, since then the subquery will only be run for every record in the result set, not for every record in the join set.

@laszlovl
Copy link
Contributor

@SamMousa Right, I should have clarified I ment @klimov-paul's examples. I checked your approach and the concept works well.

@cebe
Copy link
Member

cebe commented May 27, 2015

@SamMousa the idea sounds interesting. Need to dig deeper into the details to say more but I would keep this open as I think it could be a useful feature.

@SamMousa
Copy link
Contributor Author

@cebe I'm available on IRC (sammousa) in case you want a more in-depth / live discussion.

Also I'd be up for creating an implementation and pull request.

We mainly need to think of a syntax for retrieval if we want to support this through relations as well. (Parsing the select is error prone and not very user friendly.)
A suggestion that was made to use this the with() function; alternatively we could add a new function which is probably the cleanest approach:

   Posts::find()->with('Author')->virtualFields(['likeCount', 'Author.postCount']);

Cheers,

@SamMousa
Copy link
Contributor Author

Hmm, was thinking on syntax:

    Posts::find()->with(['Author' => function($query) { 
        $query->withFields('postCount'); 
    }])->withFields(['likeCount', 'followerCount']);

Basically the same as the with syntax; easy to parse and no need to parse complex structures. Each withFields call applies directly and only to the query it is being called upon.

@cebe If you are happy with that syntax I can rework the implementation and create a PR.

@Faryshta
Copy link
Contributor

Faryshta commented Jul 2, 2015

@SamMousa this is how I do it in my models http://pastie.org/10270677

The findWithCantidad() method is what you are missing and that you can reuse anytime you want.

Alternatively you can make an ActiveQuery class to do this automatically.

Grouping fields like count(distinct id) are not doable just adding values to the select (which btw you can do using the ActiveRecord::addSelect()) you need joins and group by too

@SamMousa
Copy link
Contributor Author

SamMousa commented Jul 3, 2015

@Faryshta there is no need for joins. Joins will actually give you a lot if head aches and functionality is limited.

Your implementation will not work for multiple virtual fields for example (since you can't have multiple independent group by)

@Faryshta
Copy link
Contributor

Faryshta commented Jul 3, 2015

@SamMousa then how do you get counts and sums without grouping?

how do you get stuff like Product::category_name without joins?

@SamMousa
Copy link
Contributor Author

SamMousa commented Jul 3, 2015

subqueries in the select clause, if you scroll up you can find the implementation.

@Faryshta
Copy link
Contributor

Faryshta commented Jul 3, 2015

@SamMousa honestly i think thats complicating the ActiveRecord to a point where only a few people will use it and be able to debug that code.

Anyway, how do you set those subqueries without remaking the relations? In my example it is possible to add conditions to the counters (for example only likes from people in my country) and i don't see how do it in yours.

@SamMousa
Copy link
Contributor Author

SamMousa commented Jul 3, 2015

Relations and virtual fields are different functionalities.
They solve a different problem.
Anyway, read the CakePHP docs linked in my OP, they definitely have a use.

They do not complicate AR at all and any functions with joins are by definition more complicated. The Yii team even chose not to use joins by default for relations.

Try getting the count of 2 different relations using 1 query with joins, that's not gonna work. Using virtual fields it always works.

@Faryshta
Copy link
Contributor

Faryshta commented Jul 3, 2015

I think the main question is: how do you set those subqueries without remaking the relations?

@SamMousa
Copy link
Contributor Author

SamMousa commented Jul 6, 2015

Faryshta, that is not a strict requirement. Currently there is no efficient solution to get this to work at all.
My solution might require a little code duplication, but it is restricted to the model only and even there it's not much that is getting duplicated.

Also note that virtual fields are not the same as relations, they have more use cases than just aggregating relations. For example consider a table containing "squares" on a 2d plane. Typical properties would be:
x1, y1, x2, y2

Virtual fields that might come in handy here are width, height and area. The advantage of defining these is that you can:

  1. Query on these fields
  2. Use them as sortable and filterable attributes in grid views and other components.
  3. Cache the results easily by utilizing database caching.

@Faryshta
Copy link
Contributor

Faryshta commented Jul 6, 2015

Query on these fields

You can do that too with the current way of doing it. By adding andFilterWhere or andHaving to your active query

Use them as sortable and filterable attributes in grid views and other components.

You can do that too with the current way of doing it. By adding the new attributes to the Sort::attributes property in your DataProvider

Cache the results easily by utilizing database caching.

You can do that too with the current way of doing it. Actually your method doesn't always implies sql caching, you need to edit the select of the active query so that you don't use *

@SamMousa
Copy link
Contributor Author

SamMousa commented Jul 6, 2015

Faryshta, just read up and read the implementation...

The discussion about usability has already been had and there is use even if you do not see it.

@SamMousa
Copy link
Contributor Author

I've implemented the following syntax, which is easy to implement and supports nesting properly:

    $query = Portfolio::find()
        ->limit(1)
        ->with(['projects' => function($query) {
            $query->withField('cityName');
        }])

The ActiveQuery class will recalculate the select clause each time a field is added through addSelect(), select() or withField().
Currently ActiveQuery uses a public property $select, which is not ideal since the implementation of virtual fields requires me to keep it in sync all the time.
Ideally it would be better to make $select private / protected and instead provide a getSelect() to get the array. Note that the property is only written to in Yii core, by the query class itself (so protected makes more sense than public in any case).

So the changes required to Yii core, would be:

  1. Make Query::$select protected / private and rename to $_select.
  2. Add getter getSelect().
  3. Add function withField() to ActiveQuery or even Query (since this approach works for regular models as well not just active record).
  4. Add static Model::virtualFields() function with default implemention (return [];)
  5. Update attributes() function to include array_keys(self::virtualFields());
  6. Update __get, __set to include virtualFields.

@SamMousa
Copy link
Contributor Author

SamMousa commented Mar 6, 2016

Bump?

@solicomo
Copy link

I am interested in this proposal.
Or is there already another easier approach to add subqueries into the primary SELECT query?

@SamMousa
Copy link
Contributor Author

@solicomo this is what we created / use: https://github.com/MarketFlow/yii2-virtual-fields

We use it as is (I think) and it hasn't needed updating in a long time.
Feel free to create issues / PRs there if you like it but require some changes!

@solicomo
Copy link

@SamMousa Okay, then.

I think this is the best way to use this feature for now since the official team doesn't want to accept it.

Sometimes, we need to hack by ourselves.

Thank you. ^_^

@samdark samdark removed this from the 2.0.x milestone Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants