Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching Example Walkthrough #47

Closed
8 tasks done
krs85 opened this issue Jul 8, 2021 · 0 comments · Fixed by #48
Closed
8 tasks done

Caching Example Walkthrough #47

krs85 opened this issue Jul 8, 2021 · 0 comments · Fixed by #48
Labels
documentation Improvements or additions to documentation read and discuss please

Comments

@krs85
Copy link
Contributor

krs85 commented Jul 8, 2021

Let's try to write out a simple (if that's even possible) example to demonstrate the cache's workflow with P$.

Example Program:

  1. To start, say file1.txt exists and file2.txt does not exist.
  2. Program opens file1.txt for reading only and reads the contents.
  3. Program creates file2.txt (HOW it makes it is very important. Does it use creat or open? What mode does it open with? Does it use O_TRUNC? O_APPEND? Don't you just love this system call interface? Isn't this just so intuitive? 🧠)
  4. Program writes to file2.txt.
  5. Program exits.
  • file1.txt is an input, and its contents are read. We hash the file when we see it opened as read only.
  • I guess the executable is another input, should be hashed at the start as well?
  • Also all the usual suspects: cwd, environment variables, yada yada yada...
  • file2.txt is an output, as it is created and written to. We hash the file when the program exits. We would then copy the file to our cache.

How do we know we can skip?
The hashes of file1.txt and the executable should match ours and the file should be present in the file system in the same location it was before.

How do we skip?
We skip the execution (#42).
We can then copy our file2.txt to its appropriate absolute path for the execution. This also means we need to keep track of that path, if we need to copy the output file over.

Can we get away with not copying the output file over?
If the hashes of file1.txt and the executable matched, and also file2.txt matches and is in the right spot in the file system, we don't have to copy over the file.

Further thoughts:
What if the program only used one file file1.txt? It reads the contents. Then it writes to the file. I think we can handle this, whether it uses O_APPEND or O_TRUNC. This is a little in the weeds, probably represents edge cases, but important to think about and document nonetheless.

  • We hash the file when it's opened for reading, this is the input file.
  • We hash the file at the end of the execution, this is the output file.
  • When we see this execution again, if our input file matches the one the new execution is using, we can just replace this file by copying over the output file from the cache.

Roughly what I need to implement:

  • Alter data structures to include file name, full path, and hash of the file
  • Hash input files (access, openat, open, read, pread64, fstat, newfstatat, stat) when we first see the access.
  • Hash output files (creat, open, openat, write, writev at the end of the execution.
  • Serialize the data structure to a file.
  • Deserialize the data structure.
  • Look ups in the data structure.
  • Copy output files to the "cache" at the end of execution.
  • Copy output files from the "cache".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation read and discuss please
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant