Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm-parser/wasm-gen or wast-parser/wast-printer incorrectly handle "end" instruction in globals initialization #955

Open
mitschabaude opened this issue Aug 20, 2021 · 4 comments

Comments

@mitschabaude
Copy link
Contributor

If I take the following .wat:

(module
  (global i32 (i32.const 0))
)

...and then

  • compile it with wat2wasm,
  • parse it to an AST with wasm-parser
  • and print it back to text format with wast-printer,

then I get the following, modified .wat code back:

(module
  (global i32 (i32.const 0)(end))
)

Notice the weird added (end) statement, which makes this code invalid (wat2wasm doesn't compile it).

Here is an end-to-end reproduction where I typed in the generated wasm by hand and also commented it. The wasm itself seems to be spec-compliant.

import {decode} from '@webassemblyjs/wasm-parser';
import {print} from '@webassemblyjs/wast-printer';

// prettier-ignore
let wasmBytes = new Uint8Array([
  0x00, 0x61, 0x73, 0x6d, // magic header
  0x01, 0x00, 0x00, 0x00, // version
  0x06, 0x06, 0x01, // globals section with 6 bytes, 1 global
  // GLOBAL
    0x7f, // type i32
    0x00, // not mutable
    // initialiation
    0x41, 0x00, // i32.const 0
    0x0b, // end of initialization
]);

// decode with wasm-parser
let ast = decode(wasmBytes, {});

// print with wast-printer
console.log(print(ast));

/* OUTPUT:
(module
  (global i32 (i32.const 0)(end))
)
*/

I'm not 100% sure what goes wrong here, but looking at the AST, I noticed that it also contains an explicit "end" instruction node. Here is the value of ast.body[0].fields[0] in the code above:

{
  type: 'Global',
  globalType: { type: 'GlobalType', valtype: 'i32', mutability: 'const' },
  init: [
    {
      type: 'Instr',
      id: 'const',
      args: [Array],
      object: 'i32',
      loc: [Object]
    },
    { type: 'Instr', id: 'end', args: [], loc: [Object] } // <--- IS THIS INTENDED?
  ],
  name: { type: 'Identifier', value: 'global_0', raw: '' },
  loc: { start: { line: 3, column: 5 }, end: { line: 3, column: 28 } }
 }

This AST is different from the one which is produced when I take my original correct wat and parse it with this repo's wast-parser. In that case I don't get the end instruction:

import {parse} from '@webassemblyjs/wast-parser';
let ast = parse('(module (global i32 (i32.const 0)))');
console.log(ast.body[0].fields[0]);
{
  type: 'Global',
  globalType: { type: 'GlobalType', valtype: 'i32', mutability: 'const' },
  init: [
    {
      type: 'Instr',
      id: 'const',
      args: [Array],
      object: 'i32',
      loc: [Object]
    }
    // SHOULD THERE BE AND "END" HERE?
  ],
  name: { type: 'Identifier', value: 'global_0', raw: '' },
  loc: { start: { line: 3, column: 5 }, end: { line: 3, column: 28 } }
}

In other words, parse(...wat...) creates a different Global node than wat2wasm followed by decode(...wasm...).

As an additional data point, if I take this repo's wasm-gen and encode the Global node above (the one without the end instruction), the generated wasm bytes don't match with the one from the original (correct) wasm:

import {parse} from '@webassemblyjs/wast-parser';
import {encodeNode} from '@webassemblyjs/wasm-gen';
let ast = parse('(module (global i32 (i32.const 0)))');
console.log(encodeNode(ast.body[0].fields[0]));

This results in the following bytes for the global:

0x7f 0x00 0x41 0x00

While in the original wasm it had an additional byte at the end, the 0x0b which is the end instruction:

0x7f 0x00 0x41 0x00 0x0b

According to the spec, "each global is initialized with an init value given by a constant initializer expression" 1, and expressions "are sequences of instructions terminated by an end marker." 2.

So that 0x0b / end byte is definitely supposed to be there! That implies one of the following:

  • Either the AST is supposed to feature that end instruction in the global initialization. That would mean that both wast-parser creates the wrong AST, and wast-printer takes a correct AST (with the end instruction) and turn it into wrong wat (with an unnecessary (end) token).

  • Or the AST format should not feature the end instruction (i.e. the instruction could be implicit). That would mean that wasm-parser creates the wrong AST, and wasm-gen takes a correct AST and turns it into wrong bytecode.

To summarize, either the wast pipeline or the wasm pipeline is treating the end byte in the init part of a global declaration incorrectly :)

Thanks for this great library, btw! ❤️

@mitschabaude
Copy link
Contributor Author

I want to add that I'd be happy to dig into the code and make a PR that fixes this, however I don't know whether it's the wast pipeline or the wasm pipeline that should be fixed.

IMO, the end instruction should show up in the AST like every other instruction (even if it isn't part of the text format), which would mean that the wasm pipeline is fine and the wast pipeline should be fixed. In any case, I'm pretty sure that these two pipelines should be able to work together and be consistent in what kind of AST they create/assume.

@xtuc
Copy link
Owner

xtuc commented Aug 28, 2021

Thanks for the the detailed report. At first glace I think the wast printer needs to changed to not print the end instruction where's not legal.

@mitschabaude
Copy link
Contributor Author

mitschabaude commented Aug 30, 2021

@xtuc another thing: I also noticed that the wasm-parser -> wast-printer pipeline generates wast code with "u32" and "u64" instead of "i32", "i64", e.g. the code will contain stuff like "u32.const" which then cannot be parsed by other tools.

If you want I could try to fix this as well in wast-printer in one PR together with the (end) issue

@xtuc
Copy link
Owner

xtuc commented Aug 30, 2021

It would be nice if you can fix that inconsistency as well, but please in a sperated PR, that's going to be easier to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants