Skip to content

investigate creating the sourcemap for the optimized-bytecode #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mr-zwets opened this issue Oct 6, 2024 · 10 comments
Closed

investigate creating the sourcemap for the optimized-bytecode #215

mr-zwets opened this issue Oct 6, 2024 · 10 comments
Labels
breaking breaking change cashc-compiler Relates to the cashc compiler
Milestone

Comments

@mr-zwets
Copy link
Member

mr-zwets commented Oct 6, 2024

currently the debug tooling operates on the unoptimized bytecode which has been a source of confusion for developers familiar with the low level workings.

For better support for hand-optimization of contracts it would be great if the Bitauth IDE debugging showed the final, optimized bytecode for the contract.

@mr-zwets mr-zwets added the cashc-compiler Relates to the cashc compiler label Oct 14, 2024
@mr-zwets mr-zwets added this to the v0.11 milestone Oct 29, 2024
@mr-zwets mr-zwets modified the milestones: v0.11, v0.12.0 Jan 15, 2025
@mr-zwets mr-zwets added the breaking breaking change label Feb 11, 2025
@mr-zwets
Copy link
Member Author

mr-zwets commented Mar 5, 2025

So to address this issue we need to change the optimiseBytecode() call in compileString to also take in the full traversal obj and to update the mapping accordingly in a new optimisedTraversal obj

// instead of
const optimisedBytecode = optimiseBytecode(traversal.output);

// rework to this
const { optimisedBytecode , optimisedTraversal } = optimiseBytecode(traversal);

The optimiseBytecode function is defined in script.ts and loops over the script to perform optimizations

  for (let i = 0; i < runs; i += 1) {
    const oldScript = script;
    script = replaceOps(script, optimisations);

    // Break on fixed point
    if (scriptToAsm(oldScript) === scriptToAsm(script)) break;
  }

after each optimization (replaceOps) we should update the traversal object to reflect the latest two-way mapping I think.

@mr-zwets
Copy link
Member Author

One note is that the Artifact interface would change, from version v0.10.0 the Artifact had a debugging.bytecode key.

After this change there wouldn't be a separate 'debugging' bytecode, meaning that we'd remove the debugging.bytecode key.
We can still make sure that the v0.10.x artifacts work with the unoptimized-bytecode or we can add a warning to recompile the artifact with the latest cashc

@mr-zwets mr-zwets modified the milestones: v0.12.0, v0.11.1, v0.11 Mar 25, 2025
@rkalis
Copy link
Member

rkalis commented Apr 11, 2025

@mr-zwets started with a PR that works on this. Today we discussed the steps that we think we should take to achieve this fully. After every step we should make sure that the resulting optimised bytecode should stay the exact same as before.

  1. Create 2 separate "optimise" functions, 1 that stays the same as the old, 1 that we use to apply our changes. Call both of these functions inside the "compileString" function and check that they are the exact same, otherwise throw an error/warning.
  2. Move the old cashproof-optimisations.ts file to the format in the PR's extra-optimizations.ts file and merge these into 1 file (separate arrays so we remember which ones were which).
  3. Change the old "regex" replace functionality to a "granular" match+replace method, where we can see where in the code (by index) the replacement occurs.
  4. In addition to replacing the opcodes, also update the corresponding location data and ip data.
  5. Check if tests still pass.
  6. Add multicontract debug test that introspects own bytecode.

@rkalis
Copy link
Member

rkalis commented Apr 15, 2025

Today we started off with quickly implementing steps 1 & 2 without too much effort.

With step 3, we decided to still use regex to find the matches, and then count spaces in the strings to convert a string match index to a script index.

We got started on step 4, which is still work in progress. What we did is when we find a match, we retrieve the first and last location corresponding to that matched pattern and merge them into a single location entry. We then apply that location entry to every opcode in the replacement and splice the locationData array to remove the old location entries and add the new location entries.

We believe this is not yet entirely accurate, because there are certain cases where the last location entry might have a column or even line number that is earlier than the first location entry, for example:

require(checkSig(s, pk));
// results in => CHECKSIG VERIFY
// first location = CHECKSIG, which starts at column 8
// last location = VERIFY, which starts at column 0

So we probably need to look at the min/max values within the replaced location entries. We also just applied the positionHint of the last location entry, but we didn't really give that a lot of thought, so we will need to check if that is actually correct.

Then after fixing those last remaining issues with location data mapping, we still need to do a similar thing for instruction pointer mapping, so that the require and console.log entries in the artifact are still correct.

@rkalis
Copy link
Member

rkalis commented Apr 17, 2025

Today we continued on step 4. We are now correctly updating the location data / sourcemap whenever we apply optimisations. We got started on updating the instruction pointers for require and console.log statements. Next session we'll have to finish that step. There are some tests failing in our WIP code. It should be pretty straightforward to make it work though.

After finishing that step, then we should update all the fixtures / tests that are affected by these changes, and we should add a test for multicontract introspection to make sure that our changes have fixed that use-case.

@rkalis
Copy link
Member

rkalis commented Apr 22, 2025

Today we managed to make some progress with making sure the ip are updated correctly for console.log statements. We managed to fix a bug where certain console.log statements were incorrectly executed.

However, we ran into a difficult problem with console.log + optimisations, because during optimisation, we are "losing" intermediate states, e.g.:

bool x = 1 == 2;
console.log(x);
require(x);

This example compiles to OP_1 OP_2 OP_EQUAL <console.log here> OP_VERIFY, but gets optimised to OP_1 OP_2 OP_EQUALVERIFY, so the intermediate result is no longer available anywhere to be logged.

We considered to keep using the unoptimised bytecode just for the console.logs, but decided that we really do want it to work with introspection properly, which requires us to use the optimised bytecode.

We brainstormed about this issue and we came up with the following solution:

We move the ip of the data point within the log (not of the console.log statement itself) to the start of the optimisation (e.g. OP_EQUAL <log> OP_VERIFY becomes <log> OP_EQUAL OP_VERIFY). Then we add the opcodes between the original and new position to a "transformations" field on the data point (e.g. "transformations": "OP_EQUAL"). Then, when we reach the console.log in debugging.ts, before we log the value, we first use Libauth to execute the script within "transformations" using the stack at the correct debug step. The resulting stack after that execution can then be used to retrieve the intermediate value.

Next session, we want to implement this plan:

  1. Extend interface for StackItem to include an optional "transformations" property
  2. Add logic in the optimisations code to:
    a. move the ip of the StackIndex to before the optimisation
    b. add the "skipped" opcodes to the "transformations" field (keeping previously added transformations)
  3. Add logic in the debugging code to execute the transformations where needed, and apply the previous "stack getting" logic to the result of those transformations

@rkalis
Copy link
Member

rkalis commented Apr 25, 2025

Today we tried to get started on the plan above, but we ran into some issues with our previous work. When updating log/require IPs, we did not take the constructor arguments into account (which contribute to the IP number, but not to the scriptIndex in the optimisations code). We've now accounted for this difference and added a test case for it. We also took the time to commit that work neatly.

Next session, we will need to implement the plan that we mentioned in the previous comment. When implementing step 2a, this will have to change some of the ternary statements that we've added as part of today's commit.

Additionally, we noticed that somehow for some console.log entries have strange type properties. For example in the debugging tests contract CONTRACT_TEST_CONSECUTIVE_LOGS there is a console.log statement console.log(beef, 1, "test", true);. In there, it lists the type of beef as bytes2 beef, but this should of course be only bytes2.

When updating the stack cleanup code in the debugging file, we added OP_DROP and OP_2DROP as potential cleanup opcodes, since OP_1 OP_NIP OP_NIP may get optimised to OP_2DROP OP_1. However, it is unclear to us what to do about this OP_1 opcode. We will need to create a test case where this occurs and go from there.

@rkalis
Copy link
Member

rkalis commented Apr 29, 2025

Today we implemented most of the "transformations" logic as specified in the comments above. There is still one issue remaining where there is a OP_CHECKSIG in the transformations, even though the intermediate checksig result is not used anywhere, and looking at the test case, the console log should happen before the entire OP_CHECKSIGVERIFY.

We haven't figured out the cause of this bug yet, so we've planned another session for tomorrow to figure it out.

@rkalis
Copy link
Member

rkalis commented Apr 30, 2025

The OP_CHECKSIG issue was due to duplicate whitespace in the processedAsm due to our replacement logic. We updated our replacement logic so that there cannot be duplicate / trailing / leading whitespace at any time. This caused a new error where OP_SWAP OP_EQUALVERIFY got changed to OP_EQUAL VERIFY rather than OP_EQUALVERIFY. We solved this by making our pattern matching disallow partial matches and explicitly adding OP_SWAP OP_(NUM)EQUALVERIFY as a separate optimisation.

We also updated the docs and merged the changes into the next branch. We plan to cut a new @next release today or tomorrow.

We still need to expand our test cases to make sure there are no edge cases that still contain some bugs. We also still need to fix the type bytes2 beef issue and update the cashc generation fixtures. So we will leave this issue open until those are done

  • Types issue
  • Update fixtures
  • Extra test cases

@rkalis
Copy link
Member

rkalis commented Jun 10, 2025

We'll close this issue as completed even though there are still a few remaining test cases. We will document these test cases in #230 and implement / track them there alongside the other related test cases for advanced transaction builder debugging.

@rkalis rkalis closed this as completed Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking breaking change cashc-compiler Relates to the cashc compiler
Projects
None yet
Development

No branches or pull requests

2 participants