Optimizations #98

pepyakin · 2018-06-18T17:13:22Z

Based on #97

Wasm isn't the greatest instruction set for interpretation. The reason for that is that Wasm is more optimized for ease of validation and compilation.

The central theme of this PR is a transformation of Wasm structured stack machine into a plain stack machine. This transformation is performed once before the interpretation alongside with validation.

See "isa.rs" for more details on this topic.

Beyond this there a couple of optimizations here and there.

With all of this, the overall increase in the performance is about almost 2x on some benchmarks.

eira-fransham

Really fantastic work here. I think it should be looked over by someone else too since I've not spent too much time on this code and I'm in a bit of a rush but I've got a few points for you to look at.

eira-fransham · 2018-06-20T15:14:47Z

src/common/stack.rs

 		self.values
-			.back()
+			.get(len - 1)


This should be .last and .last_mut

Lol definitely!

eira-fransham · 2018-06-20T15:20:04Z

src/isa.rs

+	/// Last one is the default.
+	///
+	/// Can be less than zero.
+	BrTable(Box<[Target]>),


I dislike the pointer indirection here, it's not important to fix now but I wonder if we can do something about it. We could have BrTable have one field (the length) and then have a series of Br instructions afterwards that act as the list of targets.

eira-fransham · 2018-06-20T15:22:00Z

src/runner.rs

 	/// Return from current function block.
-	Return,
+	Return(isa::DropKeep),


Can the drop/keep instructions be encoded as extra instructions at the end of the function (so that we don't have to have special-case handling)? That would mean we can make Branch and Return the same here.

What kind of special-case handling do you mean? Also, btw, drop/keep are not instructions per se, but immediate arguments of these instructions.

Special case handling of returning vs branching. Especially in a stack machine you shouldn't need to have a difference.

Oh wow, that's an interesting idea, haven't thought about it. It requires some substantial changes though.

Firstly, all branches currently are static, i.e. they encode branch destination (and drop keep) as immediates. So we need an instruction that pops a return address and branches to it. (It seems that the only instruction that needs such kind of indirect jump, so why don't keep calling it return? : ) )

Secondly, all code currently contained in a separate vectors. So you can't branch from one function and land in code for another. Although keeping all code in a single vec is more efficient and desired. I didn't go with this way since it might be more error-prone and decided to leave it for later.

Also, when I thought about making drop/keep separate instructions, I decided not to, because large part of interpreters overhead comes from a dispatch. So it seems it is profitable to fuse several instructions into one.

When this gets merged I'll write a PR to try to convert it to use a flat vector. We can even generate the bytecode for each individual function in parallel and then flatten it at the end.

You've convinced me that this is fine to keep as a special case, though.

eira-fransham · 2018-06-20T15:23:35Z

src/runner.rs


-		let mut function_stack = VecDeque::new();
-		function_stack.push_back(context);
+		let mut call_stack = Vec::new();


Can we cache this vec in self so that we don't need to realloc every call to start_execution? Especially useful because we start empty. We just call Vec::clear after run_interpreter_loop.

That will not make any difference since for now, an Interpreter is disposable. That is, every call to FuncInvoke::invoke will create a new instance of the Interpreter and then drop it after it finishes the execution.

Thankfully, this happens only once when invoking a wasm module exported function, which should be pretty infrequent.

eira-fransham · 2018-06-20T15:30:32Z

src/runner.rs

+	#[inline(always)]
+	fn run_instruction(&mut self, context: &mut FunctionContext, instruction: &isa::Instruction) -> Result<InstructionOutcome, TrapKind> {
+		match instruction {
+			&isa::Instruction::Unreachable => self.run_unreachable(context),


Are these run_* functions inlined by LLVM? That'd be a really good source of easy wins if not.

Not sure about all of them, but I experimented with inlining of hot functions (such as set_local/get_local for fac-opt) and it didn't make a difference.

eira-fransham · 2018-06-20T15:32:55Z

src/runner.rs

@@ -1252,65 +1108,92 @@ pub fn check_function_args(signature: &Signature, args: &[RuntimeValue]) -> Resu
 	Ok(())
 }

+#[derive(Debug)]
 struct ValueStack {


Why Box<[RuntimeValue]> + usize and not Vec? This seems basically equivalent to a Vec.

That's� actually for a reason. This way it's more explicit that we don't use Vec, and in particular, that we are trying to avoid Vec::push method.

I feel like that's unnecessary and just leads to having to write many 0s into the Box<[RuntimeValue]>. If we used Vec we could use with_capacity and have the extra space be transparently lazilly allocated by jemalloc

I feel like that's unnecessary

What exactly? Do you mean avoiding using Vec::push?
AFAIR, using Vec in this case actually slows down quite a lot, but I'm not sure.

just leads to having to write many 0s into the Box<[RuntimeValue]>

But if we use Vec we still need to use resize, which is basically the same?

Why do we need to use resize instead of with_capacity? Surely one of the validation steps of the bytecode is to make sure that it doesn't read uninitialised stack data.

If we use with_capacity only and leave len unchanged, then we basically can't use get, right?

Hm, I think I lost it somewhere and don't follow...

I'm thinking something like this. It makes the code a lot simpler and jemalloc can lazily allocate the space for the values:

#[derive(Debug)] struct ValueStack { buf: Vec<RuntimeValue>, } impl ValueStack { fn with_limit(limit: usize) -> ValueStack { let buf = Vec::with_capacity(limit); ValueStack { buf: buf } } #[inline] fn drop_keep(&mut self, drop_keep: isa::DropKeep) { if drop_keep.keep == isa::Keep::Single { let top = *self.top(); *self.pick_mut(drop_keep.drop as usize + 1) = top; } let cur_stack_len = self.len(); self.buf.truncate(cur_stack_len - drop_keep.drop as usize); } #[inline] fn pop_as<T>(&mut self) -> T where T: FromRuntimeValue, { let value = self.pop(); value .try_into() .expect("Due to validation stack top's type should match") } #[inline] fn pop_pair_as<T>(&mut self) -> (T, T) where T: FromRuntimeValue, { let right = self.pop_as(); let left = self.pop_as(); (left, right) } #[inline] fn pop_triple(&mut self) -> (RuntimeValue, RuntimeValue, RuntimeValue) { let right = self.pop(); let mid = self.pop(); let left = self.pop(); (left, mid, right) } #[inline] fn top(&self) -> &RuntimeValue { self.pick(1) } fn pick(&self, depth: usize) -> &RuntimeValue { &self.buf[self.buf.len() - depth] } #[inline] fn pick_mut(&mut self, depth: usize) -> &mut RuntimeValue { let old_len = self.buf.len(); &mut self.buf[old_len - depth] } #[inline] fn pop(&mut self) -> RuntimeValue { // TODO: Use `get_unchecked` since we always have at least one value self.buf.pop().unwrap() } #[inline] fn push(&mut self, value: RuntimeValue) -> Result<(), TrapKind> { if self.buf.len() == self.buf.capacity() { return Err(TrapKind::StackOverflow); } self.buf.push(value); Ok(()) } #[inline] fn len(&self) -> usize { self.buf.len() } }

I just re-run benchmarks with this version and they show that this version is indeed slower (not as slower as I expected though).

eira-fransham · 2018-06-20T15:34:34Z

src/validation/func.rs

+				context.sink.emit(isa::Instruction::SetGlobal(index));
+			}
+
+			I32Load(align, offset) => {


Could you generate this with a macro? It seems error-prone.

Yeah, you're right but I would rather not touch this code! All instructions, which happened to be placed after TeeLocal, were mechanically and carefully edited with the help of multiline cursor feature :D

If this code ever needs to be changed we have to convert it to be a macro. Will tests fail if we mess it up?

If this code ever needs to be changed we have to convert it to be a macro.

Agree.

Will tests fail if we mess it up?

I guess so

eira-fransham · 2018-06-20T15:36:04Z

src/validation/util.rs

+			acc = acc
+				.checked_add(locals_group.count())
+				.ok_or_else(||
+					Error(String::from("Locals range no in 32-bit range"))


Typo: should say "not"

eira-fransham · 2018-06-21T11:37:55Z

src/runner.rs

 }

 impl ValueStack {
 	fn with_limit(limit: usize) -> ValueStack {
+		let mut buf = Vec::new();
+		buf.resize(limit, RuntimeValue::I32(0));


If we unify the value types as in #99 we can make this use calloc

Yeah, that's the plan! (And because of this I've asked you about zero-ing FPs the other day : ) )

Great! I don't know whether it will work with a union but we can do Vec::<u64>::new() and then convert that to a Vec<RuntimeValue>. That'll definitely use calloc.

NikVolf · 2018-06-25T09:45:03Z

benches/Cargo.toml

@@ -7,3 +7,6 @@ authors = ["Sergey Pepyakin <s.pepyakin@gmail.com>"]
 wasmi = { path = ".." }
 assert_matches = "1.2"
 wabt = "0.3"
+
+[profile.bench]
+debug = true


For cachegrind, although this probably shouldn't be committed.

Yeah, or Instruments in my case : )

NikVolf · 2018-06-25T11:48:49Z

src/runner.rs

 			.pop_as();
-		Ok(InstructionOutcome::Branch(table.get(index as usize).cloned().unwrap_or(default) as usize))
+
+		let dst =


strange identation below

Maybe we can use rustfmt from here on? It would have to start with a new PR

I'd like to!

NikVolf · 2018-06-25T11:49:12Z

src/runner.rs

-		let table_func_idx: u32 = context
-			.value_stack_mut()
+		let table_func_idx: u32 = self
+			.value_stack


new line for this is overkill?

NikVolf · 2018-06-25T11:49:27Z

src/runner.rs

-			.value_stack_mut()
+	fn run_drop(&mut self) -> Result<InstructionOutcome, TrapKind> {
+		let _ = self
+			.value_stack


ditto too many lines

NikVolf · 2018-06-25T11:50:40Z

src/runner.rs

-			.expect("Due to validation stack should contain pair of values");
+		let (left, right) = self
+			.value_stack
+			.pop_pair_as::<T>();


is ::<T> needed here ?

In this particular place, yes, it is needed. Rustc has problems with infering the types.

NikVolf · 2018-06-25T12:05:28Z

src/validation/func.rs

+				let drop_keep = drop_keep_return(
+					&context.locals,
+					&context.value_stack,
+					&context.frame_stack


missing trailing comma ;)

eira-fransham · 2018-07-04T09:42:52Z

🎉

pepyakin force-pushed the flat-stack branch from b6c376b to 7f864d6 Compare June 19, 2018 14:09

eira-fransham reviewed Jun 20, 2018

View reviewed changes

eira-fransham reviewed Jun 21, 2018

View reviewed changes

This was referenced Jun 21, 2018

Optimization: Value handling operations #101

Closed

Optimization: Use compact encoding of bytecode #100

Closed

Optimization: use unions for representing RuntimeValue #99

Closed

NikVolf reviewed Jun 25, 2018

View reviewed changes

NikVolf approved these changes Jun 26, 2018

View reviewed changes

pepyakin added 17 commits June 29, 2018 14:19

Define Instruction Set.

4b7c3c0

WIP

48e4704

WIP 2

81b832f

Tests

5dd56d6

Working

40d73f2

Bunch of other tests.

875f73e

WIP

0d59364

WIP

029b052

Use Vec instead of VecDeque.

77a836f

Calibrate the limits.

ae8c834

Clean

259c59f

Clean

15ec33e

Another round of cleaning.

7f07c1c

Ignore traces.

bf80ad7

Optimize value stack

c5b8591

Optimize a bit more.

583bb63

Cache memory index.

bdbf10b

pepyakin added 17 commits June 29, 2018 14:20

Inline always instruction dispatch function.

a16e757

Comments.

d8814fa

Clean

b16feab

Clean

4e9f394

Use vector to keep unresolved references.

3f10ba6

Estimate resulting size.

bcc426c

do refactoring

b2b9d62

Validate the locals count in the begging

0f6da82

Introduce Keep and DropKeep structs in isa

2900215

Rename/Split Validator into Reader

6506705

Document stack layout

c384da2

Remove println!

76f6d0e

Fix typo.

26ff4c5

Use .last / .last_mut in stack

6647bc4

Update docs for BrTable.

4ea0aee

Review fixes.

b94f9f2

Merge.

856a34c

pepyakin force-pushed the flat-stack branch from 0b5d765 to 856a34c Compare June 29, 2018 11:24

Add an assert that stack is empty after the exec

b7fff03

pepyakin merged commit f6657ba into master Jul 4, 2018

pepyakin deleted the flat-stack branch July 4, 2018 07:08

sorpaas mentioned this pull request Jul 9, 2018

Resumable function invocation #110

Merged

Optimizations #98

Optimizations #98

Conversation

pepyakin commented Jun 18, 2018 • edited

eira-fransham left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eira-fransham Jun 21, 2018 • edited

Choose a reason for hiding this comment

pepyakin Jun 21, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eira-fransham Jun 21, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NikVolf Jun 25, 2018 • edited

Choose a reason for hiding this comment

eira-fransham Jun 25, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eira-fransham Jun 25, 2018 • edited

Choose a reason for hiding this comment

pepyakin Jun 25, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NikVolf Jun 25, 2018 • edited

Choose a reason for hiding this comment

pepyakin Jun 25, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eira-fransham commented Jul 4, 2018

pepyakin commented Jun 18, 2018 •

edited

eira-fransham Jun 21, 2018 •

edited

pepyakin Jun 21, 2018 •

edited

eira-fransham Jun 21, 2018 •

edited

NikVolf Jun 25, 2018 •

edited

eira-fransham Jun 25, 2018 •

edited

eira-fransham Jun 25, 2018 •

edited

pepyakin Jun 25, 2018 •

edited

NikVolf Jun 25, 2018 •

edited

pepyakin Jun 25, 2018 •

edited