Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Allow iterating over large data sets without running out of memory #545

Closed
phpnode opened this Issue · 3 comments

4 participants

@phpnode

Often it's necessary to iterate over large collections of models to perform a certain action on them, however due to memory constraints it's not possible to load them all at once, this can result in some nasty tricks to achieve the desired behavior.
A possible solution is to use data providers to fetch data, but keeping track of the correct page and fetching the next set of resutls is a pain to do manually, I think we should add an extra class CDataProviderIterator that manages this and allows a syntax such as:

$dataProvider = new CActiveDataProvider("User");
$iterator = new CDataProviderIterator($dataProvider);
foreach($iterator as $user) {
    // do something with $user
}

There is already an extension that does this: https://github.com/allain/activerecorditerator/blob/master/ActiveRecordIterator.php
I think we should bring this into the core, as it's useful in many different applications

@phpnode phpnode referenced this issue from a commit in phpnode/yii
@phpnode phpnode Update changelog for #545 1090357
@phpnode phpnode referenced this issue from a commit in phpnode/yii
@phpnode phpnode #545 fix whitespace 34bac59
@phpnode phpnode referenced this issue from a commit in phpnode/yii
@phpnode phpnode #545 update changelog 4951b02
@phpnode phpnode referenced this issue from a commit in phpnode/yii
@phpnode phpnode #545 make protected members private 36b0acb
@samdark
Owner

@phpnode Performance tests will be a great help deciding if this feature should be included.

@mdomba
Collaborator

A compare of this to a CDbDataReader would be good to

@phpnode

Ok I added some performance tests, have a look at the code at: https://github.com/phpnode/yii/compare/cdataprovideriterator-performance-tests

Here's the results:

Test Time (seconds) Memory (bytes)
CDbDataReader 4.9158580303192 28339952
CActiveRecord::findAll() 5.8891110420227 321388376
CDataProviderIterator 6.101970911026 31170504

So from this comparison we can see that CDbDataReader is the fastest and most memory efficient approach, and CActiveRecord::findAll() is an order of magnitude worse in terms of memory consumption. CDataProviderIterator is the slowest method, but its memory consumption is reasonable.

However this comparison with CDbDataReader misses one important aspect, it's impossible (or difficult and tedious) to retrieve relations with that method, which makes it impractical for a lot of use cases. CDataProviderIterator is most useful when you need to iterate over models and access their relations.

@phpnode phpnode referenced this issue from a commit in phpnode/yii
@phpnode phpnode #545 reformat code according to Yii style c1c87c8
@phpnode phpnode referenced this issue from a commit in phpnode/yii
@phpnode phpnode #545 more tiny code style changes b7dff4f
@cebe cebe referenced this issue from a commit
@cebe cebe Merge branch 'add-cdataprovideriterator' of https://github.com/phpnod…
…e/yii into phpnode-add-cdataprovideriterator

* 'add-cdataprovideriterator' of https://github.com/phpnode/yii:
  #545 more tiny code style changes
  #545 reformat code according to Yii style
  #545 dont use getters or setters if we can avoid it
  #545 dont use getters or setters if we can avoid it
  #545 make protected members private
  #545 update changelog
  #545 fix whitespace
  Update changelog for #545
  Add CDataProviderIterator to allow iterating over large data sets

Conflicts:
	CHANGELOG
93e0d8f
@cebe cebe was assigned
@cebe cebe closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.