Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for fetching Query results in batches #2409

Closed
wants to merge 2 commits into from

Conversation

nineinchnick
Copy link
Contributor

Resolves #2387 by adding the ability to process Query results in batches.

This PR introduces a next() method in db\Query and db\ActiveQuery, which is a wrapper over db\DataProvider. DataProvider can't be used directly, because in ActiveQuery private methods are used to perform lazy loading.

Executing following code, when there are 4 users:

$q = app\models\User::find()->with('userProfilePictures');
while(!empty($users=$q->next(2))) {
    // process
}

results in having the following queries:

SELECT * FROM `users`;
SELECT * FROM `user_profile_pictures` WHERE `user_id` IN (1, 2);
SELECT * FROM `user_profile_pictures` WHERE `user_id` IN (3, 4);

Should other Query implementations also introduce such method? Probably not if there's no cursor available thus there's no gain in batch processing.

@cebe
Copy link
Member

cebe commented Feb 13, 2014

This implementation does not allow re-using the Query object for different queries but I really like the idea! :) Needs some adjustments for reusing Query, have not checked how, yet.

$this->_dataReader = $this->createCommand($db)->query();
}
$count = 0;
$results = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$results -> $result

@nineinchnick
Copy link
Contributor Author

@cebe you mean the scenario, when not all results are read from the cursor, the query is altered and another call to next will return unexpected results?

I don't like tracking the dataReader as a protected var but I want to keep it simple. What about passing it as an argument instead the $db? Then the developer would be responsible for tracking the dataReader and closing the cursor manually if not all results are read.

@qiangxue
Copy link
Member

It looks great to me. Yes, I have the same concern about having dataReader within a query object. The query classes are designed so that they only contain information needed to build and execute a query.

How about the following syntax?

foreach (Post::find()->with('author')->batch(10) as $posts) {
    // $posts is an array with maximum length 10
}

The method batch($limit, $db) returns an iterator object which contains the reference to the query object as well as a DataReader instance. The object can have a method to close the cursor.

@Ragazzo
Copy link
Contributor

Ragazzo commented Feb 13, 2014

such feature is in L4, can look how it is done there, but they make simple wrapper.

@qiangxue
Copy link
Member

Can't find it. What is its syntax?

@Ragazzo
Copy link
Contributor

Ragazzo commented Feb 13, 2014

Chunking Results - http://laravel.com/docs/eloquent. I also like there collections but not sure if it will be implemented in Yii2.

@qiangxue
Copy link
Member

It seems to be based on limit/offset, which we want to avoid as discussed in #2387.

@nineinchnick
Copy link
Contributor Author

@qiangxue I like the iterator idea but in case of ActiveQuery the prepareResults method (that I extracted from all) would have to be public. Is that ok?
Should the iterator class be named QueryReader or QueryIterator? I'm for the former one.

@Ragazzo I also like the collections idea. I remember this forum post and extension. An issue should be created for this.

@Ragazzo
Copy link
Contributor

Ragazzo commented Feb 13, 2014

I also like the collections idea. I remember this forum post and extension. An issue should be created for this.

ok, i think this can be an extension yii2-ar-collection. @qiangxue whats your thoughts about this one? of course it is out of scope 2.0GA.

It seems to be based on limit/offset, which we want to avoid as discussed in #2387.

yeah, can you point to sentence about not using limit/offset? cant find it.

@qiangxue
Copy link
Member

@nineinchnick Yes, we need to turn some private method into public in ActiveQuery. Would you like to work on this?

Regarding the collection idea, we have discussed about it at the very beginning when we were designing Yii2 AR. Our conclusion was to keep things simple, at least for 2.0. You may go ahead and create a ticket about this. In the ticket, we would like to see every possible use case example of the collection so that we have better understanding of the requirements.

@Ragazzo
Copy link
Contributor

Ragazzo commented Feb 14, 2014

every possible use case example

to be true, i dont think there will be a lot of use-cases, main of them is to not consume a lot of memory, for example when exproting to pdf/excel. So there will not be a lot of use-cases. Worth creating ticket?

@samdark
Copy link
Member

samdark commented Feb 14, 2014

@Ragazzo yes. It's requested quite often but use cases aren't clear each time so it definitely needs a good separate discussion.

@Ragazzo
Copy link
Contributor

Ragazzo commented Feb 14, 2014

Done - #2423. Feel free to add your use-cases and delete here comments to not to mess them with main discussion here.

@qiangxue qiangxue mentioned this pull request Feb 14, 2014
@qiangxue
Copy link
Member

Just FYI, I'm working on this currently. will finish soon.

@qiangxue qiangxue closed this Feb 14, 2014
@nineinchnick
Copy link
Contributor Author

Is it finished? I haven't seen a relevant commit. I've been working on this too.

@qiangxue
Copy link
Member

Not committed yet. Almost finished. Writing documentation now.

@nineinchnick nineinchnick deleted the 2387-batch-query branch February 14, 2014 20:19
@qiangxue
Copy link
Member

Will let you know. Please help review my change. Sorry, I should let you know earlier.

@Ragazzo
Copy link
Contributor

Ragazzo commented Feb 14, 2014

Great)

@qiangxue
Copy link
Member

Done (api doc to be finished). For the usage, please see:
https://github.com/yiisoft/yii2/blob/master/docs/guide/query-builder.md#batch-query
https://github.com/yiisoft/yii2/blob/master/docs/guide/active-record.md#querying-data-from-the-database

Please help review the code. Your feedback is very welcome. Thanks.

@Ragazzo
Copy link
Contributor

Ragazzo commented Feb 14, 2014

Looks fine to me, docs are good, have not tried it yet though. Also want to note that for implementing such feature there was added not so much code, that is because of great Yii2 AR design.
Thanks)

@qiangxue
Copy link
Member

Thank you for your comments and review. All work are done.
I kept the properties of BatchQueryResult public with the intention that they might be useful outside of the class. The API doc describes when the properties can be modified.

@cebe
Copy link
Member

cebe commented Feb 15, 2014

First I have to say I am impressed how great the AR design works and how extensible it is with different features :-)
But I thought of the implementation/usage in a bit different way:

usage:

foreach($query->batch(10) as $user) {
    // $user will be one single instance
}

This way the BatchQueryResult will only hold the datareader and will load next batch when I iterate over the size of the batch. This would be more like what I would expect from this method to do. Otherwise I would need nested foreach to handle all users in DB:

foreach($query->batch(10) as $users) {
    foreach($users as $user) {
        // $user will be one single instance here
    }
}

This approach would also make it simpler to switch from:

foreach($query->all() as $user) { //...

to:

foreach($query->batch(10) as $user) { //...

This would make the iterator implementation a bit more complicated but in general more easy to use interface.

@qiangxue
Copy link
Member

@cebe: the reason for this design is to support some functions or handlers that can handle a batch of data each time. For example,

foreach ($query->batch(10) as $users) {
    $handler->handle($users);
}

Otherwise, batch size 10 and batch size 1 really don't differ much.

If you want to get one user a time, you can use each():

foreach ($query->each() as $user) {
}

@cebe
Copy link
Member

cebe commented Feb 15, 2014

okay, makes sense.

@cebe
Copy link
Member

cebe commented Feb 15, 2014

Well... thinking a bit more about it this behavior is still bad for AR because of relational queries. each() should still be able to fetch a batch internally and populate relations for a number of records.
We could override each() method in ActiveQuery to add this feature.

@nineinchnick
Copy link
Contributor Author

I agree.

@qiangxue
Copy link
Member

You are right. 'Each' also needs batch size. Will fix it.

On Saturday, February 15, 2014, Jan Waś notifications@github.com wrote:

I agree but how to pass the batch size? Rename to eachInBatch(int)?


Reply to this email directly or view it on GitHubhttps://github.com//pull/2409#issuecomment-35151119
.

@qiangxue
Copy link
Member

Done: 9a068f5

@cebe cebe added this to the 2.0 Beta milestone Feb 15, 2014
@Koudy
Copy link
Contributor

Koudy commented Mar 5, 2014

I have a problem when using this:
foreach (Customer::find()->with('orders')->each() as $customer) {
I get false in $customer.

It started to work after I added this:
public function next()
{
if ($this->_batch === null || !$this->each || $this->each && next($this->_batch) === false) {
$this->_batch = $this->fetchData();
reset($this->_batch);
}

There must be some mistake or I do something wrong?

@qiangxue
Copy link
Member

qiangxue commented Mar 5, 2014

Fixed. Thanks!

qiansen1386 pushed a commit to qiansen1386/yii2 that referenced this pull request Mar 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataProviderIterator port
6 participants