Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issues #5

Closed
guyko81 opened this issue Feb 11, 2016 · 6 comments
Closed

Memory issues #5

guyko81 opened this issue Feb 11, 2016 · 6 comments
Labels
Milestone

Comments

@guyko81
Copy link

guyko81 commented Feb 11, 2016

Hi Trevor,

it's a very nice implementation - I was searching for such solution for a long time. So really thank you!

I got only 1 issue that with long term of evolution (generations = some_huge_number; or population_size = some_huge_number + generations = some_number) the program runs out of memory. I checked the code and it saves every iteration's population. Do you think it's necessary? In my understanding we only need the current population and the best of the previous in the beginning.

What do you think, can the code be changed some way to make
self._programs = []
before every iteration and just save the previous one in a self._programs_prev (or something)?

@jamartinh
Copy link

Hello, I have also experienced the same issue, and can't run my experiments
for many iterations.

I tough it was a problem with garbage collection.

I think one can have this as a parameter. num_generations_history

Cheers,
Jose A.

2016-02-11 16:08 GMT+01:00 guyko81 notifications@github.com:

Hi Trevor,

it's a very nice implementation - I was searching for such solution for a
long time. So really thank you!

I got only 1 issue that with long term of evolution (generations =
some_huge_number; or population_size = some_huge_number + generations =
some_number) the program runs out of memory. I checked the code and it
saves every iteration's population. Do you think it's necessary? In my
understanding we only need the current population and the best of the
previous in the beginning.

What do you think, can the code be changed some way to make
self._programs = []
before every iteration and just save the previous one in a
self._programs_prev (or something)?


Reply to this email directly or view it on GitHub
#5.

/ .- .-.. .-.. / -.-- --- ..- / -. . . -.. / .. ... / .-.. --- ...- .
José Antonio Martín H. (PhD) E-Mail: jamartinh@fdi.ucm.es
Computer Science Faculty Phone: (+34) 91 3947650
Complutense University of Madrid Fax: (+34) 91 3947527
C/ Prof. José García Santesmases,s/n 28040 Madrid, Spain
web: http://www.dacya.ucm.es/jam/
LinkedIn: http://www.linkedin.com/in/jamartinh (Let's connect)
.-.. --- ...- . / .. ... / .- .-.. .-.. / .-- . / -. . . -..

@trevorstephens
Copy link
Owner

Thanks for the report! I'll look into your hypothesis @guyko81 but suspect the issue is more likely with numpy arrays being stored as the equations are recursively evaluated. These /should/ be garbage collected by Python as they are never stored in the object, but I'll check that out as well @jamartinh

I have seen this issue as well, and was thinking that a eval_size parameter might help by evaluating fewer samples at once, rather than the whole dataset. I've been meaning to work on a v0.2 for a while now. This should be top of the list.

For now, you might find using n_jobs=1 more stable (fewer evaluations at once) or ramping up the parsimony to keep the programs smaller.

@guyko81
Copy link
Author

guyko81 commented Feb 11, 2016

Thanks Trevor! Can't tell more, so thank you :)

@trevorstephens
Copy link
Owner

I've located the main culprit. It is due almost entirely to saving the indices of X & y used for evaluating a programs fitness in the case of using max_samples. These indices are also retained for no under-sampling. I am working on a fix now, and can still retain all prior populations for inspecting the lineage of a final program.

@trevorstephens
Copy link
Owner

I have also added a check at each evolution to see whether older generations are still relevant, ie whether any of their "dna" exists in the current generation. Any irrelevant programs will be removed from the old generation's population by marking them as None. This results in a massive reduction of the number of programs stored and should help significantly with memory use.

@trevorstephens
Copy link
Owner

Mostly fixed by #19 ... Please re-open if problems still persist in the master branch or the next release.

@trevorstephens trevorstephens added this to the 0.2.0 milestone Mar 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants