-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Ability to pause, save state and restore VM #480
Comments
Being worked on as a part of #489 . |
#489 is merged now, however there is very little documentation on how to use it. |
@losfair could you please document or point us how to use the Su engine to accomplish this? |
I am also interested in how to use Su, but not sure how. Not even sure if that code was removed during the 1.0 refactor? |
Does there currently exist a workaround to do this? Can't find much on Su in existing documentation currently. |
Any updates on this feat? |
+1 Any updates or plans for this? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
We will be able to use the functionality from #4263 for this. |
#4263 has been merged recently and I have been trying to implement automatic saving of the VM state. I.e. at certain moments I want to send Sigstop to the VM and save the state to the journal (via In my use case I need to save the state of the VM at arbitrary moments, not periodically (although saving every 2-3s is an option albeit not a very good one). I am pretty new to using wasmer though so maybe I am just doing it wrong. My attempts so farmacro_rules! setup_tokio_rt {
() => {
let tokio_rt = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap();
let tokio_rt_handle = tokio_rt.handle().clone();
let _tokio_rt_guard = tokio_rt_handle.enter();
};
}
// set everything up manually
// 1. snapshot restoration doesn't work because function that does it seems to be private (or pub(crate) I am not sure)
// 2. sending Sigstop doesn't do anything but sending Sigint does work (but no snapshot is saved)
// 3. snapshots are not being taken every 100ms
// NOTE: because of 1. I can't be sure that snapshots are actually not being taken, but content of the journal file does not
// change significantly and my WASM module allocates around 15 kb of memory as Vec<u8> so it should be in a snapshot I am assuming (I am using debug profile so it shouldn't optimize it away)
pub fn test1_fully_manual() {
setup_tokio_rt!();
let mut store = Store::default();
let file_path = Path::new("./output.wasm");
let module = Module::from_file(&mut store, file_path).unwrap();
let journal = Arc::new(LogFileJournal::new(Path::new("./test1.wasi-journal")).unwrap());
// create store and wasi environment
let mut wasi_env_builder = WasiEnv::builder("hello")
.stdin(Box::new(Stdin::default()))
.stdout(Box::new(Stdout::default()))
.stderr(Box::new(Stderr::default()));
wasi_env_builder.add_snapshot_trigger(SnapshotTrigger::Sigint);
wasi_env_builder.add_snapshot_trigger(SnapshotTrigger::Sigstop);
wasi_env_builder.with_snapshot_interval(Duration::from_millis(100));
wasi_env_builder.add_journal(journal);
wasi_env_builder.set_module_hash(ModuleHash::from_bytes([0, 0, 0, 0, 0, 0, 0, 0]));
let wasi_env = wasi_env_builder.build().unwrap();
let tasks = wasi_env.runtime.task_manager().clone();
let mut wasi_fn_env = WasiFunctionEnv::new(&mut store, wasi_env);
// imports
let mut import_object = wasi_fn_env
.import_object_for_all_wasi_versions(&mut store, &module)
.unwrap();
// TODO: add my own imports
let mut store_mut = store.as_store_mut();
let memory = tasks
.build_memory(&mut store_mut, SpawnMemoryType::CreateMemory)
.unwrap();
if let Some(memory) = memory.as_ref() {
import_object.define("env", "memory", memory.clone());
}
let instance = Instance::new(&mut store, &module, &import_object).unwrap();
wasi_fn_env
.initialize_with_memory(&mut store, instance.clone(), memory, true)
.unwrap();
let start_fn = instance.exports.get_function("_start").unwrap();
let data = wasi_fn_env.data(&mut store).clone();
std::thread::spawn(move || {
std::thread::sleep(Duration::from_secs(1));
data.process.signal_process(Signal::Sigint); // Sigstop does nothing, Sigint stops the process without taking a snapshot and then crashes the
});
let result = start_fn.call(&mut store, &[]);
result.unwrap();
}
// very similar to test1 but now restoring from the journal actually works, except there is still
// no snapshot only syscalls like "println"s are being restored
// also SnapshotTrigger::FirstStdin outright crashes the WASI process when I call stdin().read_line() from within it
// # UPDATE: I figured out issue with the crash, you can ignore this example probably
pub fn test2_run_with_store_ext() {
setup_tokio_rt!();
let mut store = Store::default();
let file_path = Path::new("./output.wasm");
let module: Module = Module::from_file(&mut store, file_path).unwrap();
let journal = Arc::new(LogFileJournal::new(Path::new("./test2.wasi-journal")).unwrap());
let mut builder = WasiEnv::builder("hello")
.stdin(Box::new(Stdin::default()))
.stdout(Box::new(Stdout::default()))
.stderr(Box::new(Stderr::default()));
builder.add_journal(journal);
builder.add_snapshot_trigger(SnapshotTrigger::FirstStdin); // THIS LINE crashes the VM when it gets to stdin().read_line()
builder.with_snapshot_interval(Duration::from_millis(500));
builder
.run_with_store_ext(
module,
ModuleHash::from_bytes([0, 1, 2, 3, 4, 5, 6, 7]),
&mut store,
)
.unwrap();
}
// test3: using wasi_runner - could not find a way to send a Sigstop, but
// periodical snapshots don't seem to work either
pub fn test3_wasi_runner() {
let tokio_rt = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap();
let handle = tokio_rt.handle().clone();
let _guard = handle.enter();
let mut store = Store::default();
let engine = store.engine().clone();
let file_path = Path::new("./output.wasm");
let module: Module = Module::from_file(&mut store, file_path).unwrap();
let journal = CompactingLogFileJournal::new(Path::new("./test3.wasi-journal"))
.unwrap()
.with_compact_on_drop();
let journal = Arc::new(journal);
let task_manager = Arc::new(TokioTaskManager::new(tokio_rt));
let mut rt = PluggableRuntime::new(task_manager.clone());
rt.add_journal(journal.clone());
// runtime.set_engine(Some(store.engine().clone()));
rt.set_networking_implementation(virtual_net::UnsupportedVirtualNetworking::default());
let tty = Arc::new(SysTty::default());
tty.reset();
rt.set_tty(tty);
rt.set_engine(Some(engine));
let rt = Arc::new(rt);
let mut runner = WasiRunner::new()
.with_args(Vec::<&'static str>::new())
.with_forward_host_env(true)
.with_capabilities(Capabilities::default());
runner
.add_journal(journal)
.add_default_snapshot_triggers()
.with_snapshot_interval(Duration::from_millis(500))
.add_snapshot_trigger(SnapshotTrigger::Sigstop);
runner
.run_wasm(
rt,
"hello",
&module,
ModuleHash::from_bytes([0, 1, 2, 3, 4, 5, 6, 7]), // i am too lazy to calculate actual hash, sorry
true,
)
.unwrap()
} WASM moduleWASM module is built for wasm32-wasi in Rust and then WASM module: fn main() {
// allocate 15kb of memory with repeating pattern to be able to see it in the snapshot
// easily
let mut v = Vec::<u8>::new();
let mut byte: u8 = 0;
for _ in 0..15000 {
v.push(byte);
byte = if byte == 255 { 0 } else { byte + 1 }
}
// just waste some time
for i in 0..10 {
println!("{}", i);
std::thread::sleep(Duration::from_millis(200));
}
for i in v {
print!("{}", i)
}
// FirstStdin snapshot trigger
let mut s = String::new();
stdin().read_line(&mut s).unwrap();
println!("you typed: {}", s);
} On a related note: the only way I found how to send Sigstop to WasmProcess is to setup WASI manually rather than use EDIT: formatting EDIT 2: by the way it's a little weird that I have to EDIT 3: I figured out the issue with the crash in example 2 (see code above). Wasmer assumes that __stack_pointer global is exported and tries to access it, which is not the case for Rust where To summarize:
Some more findings:
|
@john-sharratt could you please help (sorry for pinging but since you're the original author of the journaling PR it might be our best shot)? |
Hopefully @john-sharratt can provide more context. Regarding the stack pointer, I think that is provided only in |
For capturing the stack to work the module needs to be compiled with wasix (see cargo-wasix / wasix.org), because it uses asyncify to enable unwinding and rewinding. |
The first release of the journaling capability we merged was scope limited to a number of use cases, in particular the DCGI runner, the reason we could not keep going with it is the PR was becoming huge so we had to merge it as is, with a stable test pass rate. That means the DProxy functionality will close out the remaining capability, which is now being worked on here: In hind-sight the journaling functionality of course would get quite some interest given its capability so I can understand why you are jumping on it now before GA which is a good thing as the more hands on it the faster everyone gets the capability - but that does mean you are ahead on some of the use-cases and will hit problems that others won't. On the use-case you have explicitly linked there are some restrictions on how you build the app for it to work properly (which we should probably detect at runtime and issue warnings).
When compiled in that way it will export the globals it needs to snapshot threads (in particular the main thread), and while the main thread snapshotting (which is not needed for DCGI) should work it has not been anywhere nearly tested enough compared to the more basic use-case due to the scope limitations of focusing on DCGI. Some known issues when attempting to use journaling on full blown apps: Looking at the code you posted the If you would like to really run at the absolute cutting edge you can jump on over to the I do see your point about the triggers for snapshots being a problem, will have to look into this some more and brainstorm some ideas as having many robust and simple ways to trigger a snapshot is going to be important for it to be the most useful. For a direct answer to the questions you asked:
Yes, more complex use-cases will be added to the DProxy branch along with examples that the guys will publish when a article about the feature is published.
Looks like you need to do step 1 and 2 when compiling
Yeah that makes sense, if you want to have a go at a PR we can review it and get it in earlier (otherwise I'll drop it in my PR at a later date)
Journaling does work with WASI but it will not be able to save or restore thread state as only WASIX has the extensions that make this possible. In practice that means WASI with journaling is mainly useful for saving and restoring the file system, which is all that DCGI uses it for at the moment. The good news is that WASIX is fully backwards compatible with WASI so you can take WASI code and straight up compile it to WASIX.
This will be possible in the future, the triggers for this were added as a placeholder however they are not all wired up yet, it's not a big difficultly to add signal hooks and wire that up. If you want to have a crack at a PR you can otherwise I'll take a look at it in the DProxy PR.
The stack pointer is used to unwind the stack when capturing it from memory, that means the Wasmer runtime needs that global in order to know where in the memory the stack starts and stops. When compiling to WASI it does not export that global, but when compiling to WASIX it does
Seems this is not wired properly yet, will look into it in the DProxy PR. |
Just to clarify, No need to do that manually. |
I think that step only runs in release builds |
I see, thank you for your responses. After following steps you provided it works. Some triggers still do not, but as you said they are placeholders. I was testing with |
@MaratBR good news, we've made some excellent progress on this - it's not quite mainline ready yet, but the main parts are there. You should be able to use single threaded applications, take snapshots and resume them. Will update again when multi-threading is implemented and the patch hits mainline |
@john-sharratt Thanks for the clarification. |
@Wulfheart probably it will go into master this week I suspect |
@john-sharratt Thanks for the clarification. I am looking forward to this. |
Do you have any docs on how this works? |
@Wulfheart its primarily initiated in the CLI which has some limited documentation in the help commands. There is also a document here that describes it at a high level |
This is now merged in It will go out in the next release |
Hi, |
The actual proposal is wasmerio/wasmer-go#22, forwarded here as requested by @Hywan.
Motivation
With ability to pause the VM, save it's state (memory, execution context) and restore it later
it'd be possible to created persist-able sandboxed environments with ability to migrate them, restart, provide maintenance to hosts with little to no interruption of user programs.
Proposal
Add necessary functions (with checkpoints ?) to take pause/resume VM flow, export it's memory alongside with all the state (registers, instruction pointer) and ability to create VM with imported state.
The text was updated successfully, but these errors were encountered: