Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the functionality required to run at least one clustal job #48

Merged
merged 38 commits into from
Nov 3, 2021

Conversation

krs85
Copy link
Contributor

@krs85 krs85 commented Oct 15, 2021

The name of the branch is a misnomer... I have been working on this branch for too long (obviously), and it is very much time to rebase. I don't want to squash the commits as I think the history will prove valuable in the future. Basically what these commits do is add the functionality required to get Process$ to the point where it can cache and then skip basic programs.

Relevant issues:
#47 , #42

List of functionality added:

  • Adds the functionality to skip an execution by intercepting execve and, provided it is successful and we have it cached, the next system call is changed to an exit call, effectively skipping the execution.
  • If it is a failed execution, we still look it up in the cache to see if we have seen it before (and expect it to fail). Then we just let it fail.
  • Add hashing functions. Add hashing of input files for the following system calls: access, read, stat, fstat, newfstat, and the open varieties. Add hashing of output files for openat, open, creat, write.
  • Generate input file hashes when we first see the file access.
  • Generate output file hashes at the end of the execution.
  • Add serialization and deserialization functions.
  • Basic cache lookup based on:
    Metadata: success or failure, executable name, env vars, cwd
    Input and output files hashes matching
  • Copy output files to the cache at the end of a successful recording run.
  • Overall simplification of data structures for: global cache of executions, individual execution record structs, and file access record structs.
  • Extension of data structures to handle multiprocess programs and nested execution trees.
  • Panic!: if trying to copy a file to the cache that's already there, if we see another output file event when this file has already been accessed as an output, if a process tries to call execve multiple times (not handling this right now, but we could just handle this as all one big execution, either all execve calls the single process does are skippable or none are), and when we expect an execve to fail (cached version does) but it succeeds.
  • Current cache lookup is very conservative and is “all or nothing”. If a root process spawns many child processes, we check that ALL are skippable, and only skip if this is true.

krs85 added 30 commits July 14, 2021 15:42
@krs85 krs85 requested a review from gatoWololo October 15, 2021 18:06
This was linked to issues Oct 15, 2021
Copy link
Contributor

@gatoWololo gatoWololo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently seeing a compilation error because of:

[[example]]
name = "hashes_are_equal"
path = "src/hashes_are_equal.rs"

but "src/hashes_are_equal.rs" doesn't exist anymore.

Your Rust code is great and fun to read! Very understandable! I think I know how to simplify the Execution enum so that we don't need Pending root and we can avoid a bunch of the extra logic with matching on the enum variant. I can tell you about it next time we have time. I'd also be interested in implementing the code change if you like it.

src/execution.rs Show resolved Hide resolved
src/execution.rs Show resolved Hide resolved
src/execution.rs Show resolved Hide resolved
src/execution.rs Outdated Show resolved Hide resolved
src/execution.rs Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
src/cache.rs Outdated Show resolved Hide resolved
@krs85 krs85 merged commit b953a81 into master Nov 3, 2021
@krs85 krs85 deleted the change-syscall branch January 10, 2023 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Caching Example Walkthrough Skipping Those Pesky Executions
2 participants