-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce cache to speed things up and cleanup method #171
Conversation
@@ -11,6 +11,13 @@ def self.database | |||
@database ||= "[#{facterdb_fact_files.map { |f| read_json_file(f) }.join(',')}]\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way this should have been probably in a synchronized block, but that's another story.
This may make voxpupuli/rspec-puppet-facts@43bb17e less needed. One thing to note is that there I chose to use marshal as a way to deep clone in order to prevent in place modifications would corrupt the cache. |
That's an interesting thought. Looks like it'll require some fiddling in the gemspec (now it uses git ls-files) and the release step. This gem was just migrated to Voxpupuli last week and we haven't had a chance to convert from Travis to Github Actions, but the release actions we use should make it quite easy to add another file. I'll look into this. |
Probably, on the other hand the
It might not speedup things much, I think the biggest offender is the JGrep but I might be wrong. If there is a chance to see a release after this is merged, we would be able to rebase our patch and use the |
filter_str = generate_filter_str(filter) | ||
JGrep.jgrep(database, filter_str).map { |hash| Hash[hash.map{ |k, v| [k.to_sym, v] }] } | ||
result = JGrep.jgrep(database, filter_str).map { |hash| Hash[hash.map{ |k, v| [k.to_sym, v] }] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: this is why only the keys are symbols while it should really be something like deep_symbolize
or use something like https://api.rubyonrails.org/classes/ActiveSupport/HashWithIndifferentAccess.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the implementation enough, but my general rule of thumb is - for big dictionaries, stay away from symbolizing/indifferent access it can be quite slow. I suggest to use just strings and convert all input to strings as well if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that's an API change that's quite big. That's why HashWithIndifferentAccess is probably a cleaner solution.
@lzap thanks for the PR. Can you please rebase? That should enable CI again |
Sure, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but it would be awesome if someone like @alexjfisher or @ekohl , that really understand ruby, could take a look as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (only the last entry) does match most use cases and where it doesn't, the extra CPU time is neglible. Thanks!
Whatsup folks! A colleague of mine added puppetdb into our tests and I am extremely happy to see more test coverage thanks to this wonderful effort. But our tests are now a minute slower. We can do better, come on! :-)
This patch helps a bit with two problems - CPU and memory.
First, from what I understand every filter string should generate the very same result. So why not to cache both input (filter string) and output? I use thread-local variable for that. It memorizes the last filter seen and if there is a match, it quickly returns the filtered data. This will be also thread-safe.
Now this will only help in cases when facts are requested with same filter, however it looks like that is exactly what we do in our test suite. We construct facterdb and then get facts for several combinations of OSes testing one fact after another. This patch shaves off a lot of time in our case. And if you think our project might be some kind of special case, well, I measured tests of the facterdb itself.
Without this patch:
With this patch:
That's a reasonable 50% speedup if I do my math correctly. Not bad for Friday, not bad. I am not stopping there tho.
Second, once facts are loaded into memory and stored in the
@database
field, there is no way to reclaim the memory back. The files currently weights about 15MB and it will probably only grow. So I added a new method calledcleanup
which allows Ruby VM to reclaim this memory back. Not problem for this project, but our project has about 4000 other tests to execute after its done with facterdb. Typically users will call this method in global teardown or after(:suite) or similar.I also have a suggestion which I am not going to work on: I understand that the database is populated into one huge JSON string during runtime. Why this isn't part of the build process? That huge JSON could be distributed in the gem and built via a Rake task so there is no need to load it from individual files. Just a tought.
Cheers!