Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action plan for slooo #43

Closed
1 of 7 tasks
Essoz opened this issue Mar 5, 2022 · 6 comments
Closed
1 of 7 tasks

Action plan for slooo #43

Essoz opened this issue Mar 5, 2022 · 6 comments

Comments

@Essoz
Copy link
Collaborator

Essoz commented Mar 5, 2022

Based on #39 #40, we propose the following steps for improving slooo.

Start with easier improvements that make the code base more structured, to fully make use of the OOP model and to get rid of some heritage from the DepFast project.

  • Logging useful information like node membership for each experiment to provide a stronger base for result reasoning.
  • Error detection in code: we want to identify command execution status in utility functions to avoid dumping meaningless outputs into terminal.
  • Support easy switch between multiple levels of fail-slow faults and allows users to specify multiple slowness configs.
  • Slooo code redesign using OOP model. We want to allow for easier feature integration and better code readability since it is external users that adapt the tool to the quorum system. The redesign will be considered in parallel with the above three points.

Then we work on

  • system data collection: Use extra threads to record system usage. We may allow the user to specify sample rates, but specific details need further discussion.

and,

  • The point break (auto-tuning) feature. Our current thought is that, the tool should run experiments using random levels of fail-slow faults. Then, based on previous results (a statistical approach), the tool narrows the range of fail-slow faults and run a new round of experiments and repeats the process. Finally, we narrow down to the "breaking" point. My concerns for this approach are that: (1) it can be costly to run multiple rounds of experiments, (2) the statistical approach may not work, and thus the tool may find nothing eventually.

finally,

  • rewrite the slooo documentation contributed by @tianyin
@varshith15
Copy link
Collaborator

@Essoz We can take inspiration from big projects like https://github.com/xonsh/xonsh#projects-that-use-xonsh
for inspiration to write cleaner code.

@varshith15
Copy link
Collaborator

For process info collection we can take the help of https://github.com/astrofrog/psrecord and add our own plugins to it

@Essoz
Copy link
Collaborator Author

Essoz commented Mar 15, 2022

These are great suggestions, I will look into them later!

@varshith15
Copy link
Collaborator

@Essoz we should also add testcases for the common code like fault inject, general_utils and stuff

@Essoz
Copy link
Collaborator Author

Essoz commented Apr 1, 2022

This is long overdue. But I have not figured out how we can verify the amount of slowess injected. Do you have any ideas?

@varshith15
Copy link
Collaborator

We just check if the pid has been added to the cgroup that's all.

@tianyin tianyin closed this as completed Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants