Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression trees cause GC churn #16

Closed
lukehutch opened this issue May 12, 2016 · 3 comments
Closed

Regression trees cause GC churn #16

lukehutch opened this issue May 12, 2016 · 3 comments

Comments

@lukehutch
Copy link

The regression tree methods in MLTK allocate and drop a huge number of objects, which causes GC churn, and hugely strains the VM. A huge amount of time is spent in garbage collection, and the impact is even worse if you are trying to run several regressions in parallel, since the JVM doesn't do a good job of concurrent garbage collection. (The scalability issue alone will probably mean I can't use MLTK for my task.)

An object instance recycling scheme would help immensely with this problem.

@yinlou
Copy link
Owner

yinlou commented May 12, 2016

Which learner are you using? How large is your dataset? How much memory did you allocate?

@lukehutch
Copy link
Author

I tried using RegressionTreeLearner with both LSBoostLearner and LADBoostLearner, both have the same problem.

I have up to about 124,000 training examples (with about 400 dimensions) and 10,000 test examples.

The amount of memory taken by a single thread is of the order of 4-8GB or so, but the amount fluctuates up and down by about 1-2GB every several seconds (the downward fluctuations are due to garbage collection).

Viewing CPU usage activity while running several regressors in different threads shows all the cores floating between about 20 and 60%, with major churn. This is indicative of heavy GC activity. Multithreaded Java programs that allocate no new objects keep all the cores busy at 100%.

@yinlou
Copy link
Owner

yinlou commented May 18, 2016

I just made some edits to save memory. Let me know if that helps. Thank you!

@yinlou yinlou closed this as completed Jun 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants