-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable multi-workers, and make data loading faster #12
Conversation
The performance of the computations of the metrics datastructures are currently the single most PITA when using graph-explorer with a large amount of metrics. this is currently a single threaded operation that could be easily parrallelized (by splitting up the metrics list in chunks). Anyway, looking at your PR:
thanks |
Hi: to 2): Gunicorn and paste have the same function in this case: managing workers, single or multiple. Without my commit, I can also start multiple worker with paste or gunicorn, but these workers can not hold the same data(targets_all, graphs_all) in the same time. For example I start 2 workers A and B with paste server, when I "GET /refresh_data" to rebuild data, Worker A will serve my request, and "targets_all" and "graphs_all" will be rebuild in worker A, but they are still old in worker B, becase B did't serve my "GET /refresh_data" this time. to 3): The speed is archived ONLY when rebuild data(targets_all, graphs_all) by these steps: (1) Instead of rebuilding data in web app worker, I build these data in cron(update_metrics.py), when this cron runs, I not only download metrics, but also rebuild these two big data structure "targets_all" and "graphs_all", then "pickle.dumps" them to disk files with module "pickle". (2) Instead of urllib.urlopen('/refresh_data'), in each request, workers will TRY TO reload data(targets_all and graphs_all dumped just now) by comparing timestamp of files dumped just now with that value(timestamp) stored last time reloaded. (3) Finally, workers can easily RELOAD data by "pickle.loads" disk files dumped just now. In my case, it will cost 3 seconds when there are 20, 000 metrics. PS: As pickle can not dumps lambda, I copied targets_all, and deleted lambdas from the copy. |
We have already use this patch in our production environment for a week, it seems good. |
if not (os.path.isfile(config.targets_all_cache_file) and \ | ||
os.path.isfile(config.graphs_all_cache_file) and \ | ||
os.path.isfile(config.filename_metrics)): | ||
backend.update_data(s_metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be a race condition with multiple workers/threads ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, as workers are independent, I think they will work fine if the data(targets_all.cache & graphs_all.cache) is prepared.
hey so like i mentioned earlier, i wanted to get rid of the lambda functions earlier on. as it turns out this was pretty easy to implement and i pushed it to master: Dieterbe/graph-explorer@c01a8e9 so now you can pickle the list directly. thanks for the explanation, that clears up a lot for me. I'm personally very inexperienced with webservers such as paste and gunicorn. it probably makes sense to have a config option that specifies the number of workers, but that would probably be a different implementation for paste vs gunicorn.. maybe a note in the deployment section in the readme about multiple workers? I'm suffering a lot from GE being unresponsive, and I wonder if maybe that's because of the old Btw I think there's a race condition, I added a comment to the code line. |
Because the old 'refresh_data' need a lot computing( regex matching ), it will capture the whole computer as a result of PIL in python, so the worker will be unresponsive. The pickle.dumps & pickle.loads or saving these data to database as you said will help you to make this faster. Good luck ! |
I merged it, applied some fixes (please check the commit log and have a look), but there's still some bugs that have to be fixed:
|
wow.. github closed this automatically and doesn't let me reopen it. @huoxy can you reopen? |
I can not reopen it either... to 3) I have used gunicorn In my case, I edited app.py and append 'app = bottle.default_app()' to it, and then run app.py with gunicorn like this without graph-explorer.py :
or edit graph-explorer.py :
then it will start 4 processes. see this: http://blog.yprez.com/running-a-bottle-app-with-gunicorn.html |
ok thanks, i created separate issues (see above) for these. |
I cache these data("targets_all", "graphs_all") to disk files when run update_metrics.py,
then delete the urlopen in update_metrics.py, because app.py will reload these data by comparing
mtime
of these files when response /index.When I tested multi-workers with gunicorn, I found it would refresh_data in a SINGLE worker. then I make app.py compare the mtime of these data files in each request to /index, if these files were modified, reload them.
It also works in a single worker as before, but some minutes faster, :)