Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support address code attribute #2583

Merged

Conversation

hi-ogawa
Copy link
Contributor

@hi-ogawa hi-ogawa commented Jan 3, 2022

What I did

Support code attribute for address type as described in #2427
Accordingly, I updated the documentation (address type section in docs/types.rst).

How I did it

  • Validation

    the usage of <address>.code is restricted to be in slice(<address>.code, start, length) where length is a constant.
    this validation is done during ExpressionAnnotationVisitor.visit_Attribute.
    I noticed that slice(msg.data, ...) validation was implemented in FunctionNodeVisitor.visit_Attribute,
    but the type information is not available during FunctionNodeVisitor whereas ExpressionAnnotationVisitor already makes use of get_exact_type_from_node, so I added a validation logic for <address>.code in ExpressionAnnotationVisitor.

  • Code generation

    • Step 1.
      During Expr.parse_Attribute, detect <address>.code and generate (temporary) LLL node ~extcode.

    • Step 2.
      During Slice.build_LLL, detect if the first argument of slice built-in call is ~extcode LLL node. If so, replace it with codecopy or extcodecopy evm opcode depending on whether <address> is self or not.

How to verify it

Added new test cases in tests/parser/syntax/test_address_code.py, which shows the three cases of <address>.code where

  • extcodecopy is used,
  • codecopy is used during deployment (i.e. in constructor),
  • codecopy is used after deployed.

Description for the changelog

Support code attribute for address type. Add code as a reserved keyword.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

Concerns/Questions

  1. It seems that the attributes of address type (e.g. balance, codesize) is included in RESERVED_KEYWORDS, so I added code into that as well. However, I believe this would be a breaking change for the people who used code as variable, user defined struct members, etc...
    It might be possible not to include it in RESERVED_KEYWORDS, but, at least, the use of self.code won't be available to users anymore, e.g. if there is an existing code with code: public(uint256), that would become a compile error.
    Please let me know if there's any guideline for how such breaking changes could be treated.

  2. The use of temporary LLL node with ~extcode might be a bit strange. I noticed that msg.data uses an LLL node with value=0, location="calldata" to denote the special case. Compared to that, the use of the LLL node ~extcode felt more explit/concise way, so I went with it, but probably there might be a nicer way to do it. I would appreciate if someone could advice me on that.

  3. (EDIT) I noticed that slice(msg.data, ..., length) has an assertion to check the bound, but I didn't put that check for slice(<address>.code, ..., length). This is because, as far as I can tell (at least in official go-ethereum evm), codecopy and extcodecopy would zero fill the data when the length exceeds the actual bytecode (cf. https://github.com/ethereum/go-ethereum/blob/356bbe343a30789e77bb38f25983c8f2f2bfbb47/core/vm/instructions.go#L357-L371). Please let me know if I'm missing some significance of such bound check.

  4. (EDIT) I think I should add something about "dynamic gas" for EXTCODECOPY (cf. https://github.com/ethereum/go-ethereum/blob/356bbe343a30789e77bb38f25983c8f2f2bfbb47/core/vm/jump_table.go#L446-L453). I'm still on the way of research and hopefully I can figure out what to do. (dynamic gas cost estimation is done in 1be8c2c)

Thanks for the review!

@codecov-commenter
Copy link

codecov-commenter commented Jan 3, 2022

Codecov Report

Merging #2583 (13f93a3) into master (23ed82c) will increase coverage by 0.34%.
The diff coverage is 97.87%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2583      +/-   ##
==========================================
+ Coverage   86.26%   86.61%   +0.34%     
==========================================
  Files          90       91       +1     
  Lines        9351     9413      +62     
  Branches     2370     2354      -16     
==========================================
+ Hits         8067     8153      +86     
+ Misses        796      775      -21     
+ Partials      488      485       -3     
Impacted Files Coverage Δ
vyper/semantics/namespace.py 96.87% <ø> (ø)
vyper/semantics/validation/annotation.py 93.15% <ø> (-0.10%) ⬇️
vyper/utils.py 87.23% <ø> (+3.79%) ⬆️
vyper/codegen/expr.py 80.62% <85.71%> (+0.03%) ⬆️
vyper/semantics/validation/local.py 92.23% <98.52%> (+1.39%) ⬆️
vyper/builtin_functions/functions.py 88.26% <100.00%> (+0.08%) ⬆️
vyper/codegen/lll_node.py 94.02% <100.00%> (+0.04%) ⬆️
vyper/semantics/types/value/address.py 93.10% <100.00%> (+1.43%) ⬆️
vyper/lll/compile_lll.py 94.10% <0.00%> (-0.43%) ⬇️
vyper/semantics/types/user/interface.py 81.57% <0.00%> (-0.41%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23ed82c...13f93a3. Read the comment docs.

@hi-ogawa hi-ogawa force-pushed the 2427-feat-address-code-attribute branch from 138845b to a35c233 Compare January 3, 2022 06:49
@hi-ogawa
Copy link
Contributor Author

hi-ogawa commented Jan 3, 2022

The CI failure could be due to some flakiness of deadline setting (https://github.com/vyperlang/vyper/runs/4688866148?check_suite_focus=true#step:5:85). I cannot reproduce the test failure locally.

Copy link
Member

@fubuloubu fubuloubu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Will have to give a more thorough review later.

vyper/semantics/namespace.py Show resolved Hide resolved
@charles-cooper
Copy link
Member

@hi-ogawa I think the approach looks good, I can review in more detail later. I like your use of ~extcode to denote the special case.

Regarding the bounds check

(EDIT) I noticed that slice(msg.data, ..., length) has an assertion to check the bound, but I didn't put that check for slice(

.code, ..., length). This is because, as far as I can tell (at least in official go-ethereum evm), codecopy and extcodecopy would zero fill the data when the length exceeds the actual bytecode (cf. https://github.com/ethereum/go-ethereum/blob/356bbe343a30789e77bb38f25983c8f2f2bfbb47/core/vm/instructions.go#L357-L371). Please let me know if I'm missing some significance of such bound check.

The bounds check is used for safety reasons. Calldata also zero fills when the length exceeds the actual calldata, but we check the length anyways. I think for consistency, we should include the bounds check here.

vyper/builtin_functions/functions.py Outdated Show resolved Hide resolved
vyper/builtin_functions/functions.py Outdated Show resolved Hide resolved
@@ -96,6 +97,23 @@ def visit_For(self, node):
self.expr_visitor.visit(node.iter)


def validate_address_code_attribute(node: vy_ast.Attribute, type_: BaseTypeDefinition) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this functionality is more appropriate in vyper/semantics/validation/local.py. It would be very clean if this functionality were put in the following function:

def visit_Attribute(self, node):
if node.get("value.id") == "msg" and node.attr == "data":
parent = node.get_ancestor()
if parent.get("func.id") not in ("slice", "len"):
raise SyntaxException(
"msg.data is only allowed inside of the slice or len functions",
node.node_source_code,
node.lineno,
node.col_offset,
)
.

Copy link
Contributor Author

@hi-ogawa hi-ogawa Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the validation from ExpressionAnnotationVisitor to FunctionNodeVisitor (5a5798d).

Copy link
Contributor Author

@hi-ogawa hi-ogawa Jan 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I have a bug whichever ExpressionAnnotationVisitor or FunctionNodeVisitor I choose.
With ExpressionAnnotationVisitor, I think some attribute nodes won't get traversed and miss the validation.
With FunctionNodeVisitor, as showed up on CI, get_exact_type_from_node will fail on the second visit since it doesn't setup correct namespaces for e.g. for loop.

def visit(self, node):
super().visit(node)
self.annotation_visitor.visit(node)
attr_descendants = node.get_descendants(vy_ast.Attribute)
for attr_descendant in attr_descendants:
self.visit(attr_descendant)

As @charles-cooper suggested, it surely makes sense to do the validation in FunctionNodeVisitor, but I believe I need to make some change. Let me think about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any typechecking should be done in the annotation visitor since that merely propagates types to nodes. I'm a bit confused why you're code wouldn't work in FunctionNodeVisitor. Is there some bug regarding typechecking in loops?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into it but I was just in the process of fixing some typechecking for list literals. cf. #2584 (comment), #2584 (comment) and #2587

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hi-ogawa could you merge in master now that #2556 is merged and see if that fixes the issue

Copy link
Contributor Author

@hi-ogawa hi-ogawa Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know about that! I can get back to this tomorrow and I will check that out if it works. Sorry for the delay.
Regarding the reason why FunctionNodeVisitor doesn't work, I think that's because of the structure around this:

 def visit(self, node): 
     super().visit(node)   #  <<<=== 1st visit (possibly `Attribute` won't be visited this time)
     self.annotation_visitor.visit(node) 
  
     attr_descendants = node.get_descendants(vy_ast.Attribute) 
     for attr_descendant in attr_descendants: 
         self.visit(attr_descendant)    # <<<=== 2nd visit (I think, this is to make sure to visit all the `Attribute` nodes missed on 1st visit above)

The way visiting the Attribute on 2nd time doesn't setup namespace properly. For example, from CI,

https://github.com/vyperlang/vyper/runs/4707801642?check_suite_focus=true#step:5:6264

for i in ...:
  ... _baz[i].x ...  # <<<=== try to visit this `Attribute` node

here, this visits Attribute node _baz[i].x without setting up the namespace for the for-loop variable i and thus get_exact_type_from_node on the value node _baz[i] is failing.

I haven't actually debug it, so I might be wrong, but this is my intuition of the issue.

Copy link
Contributor Author

@hi-ogawa hi-ogawa Jan 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charles-cooper Sorry for the break and thanks the patience.
I pushed the commit f13e4f7 for the fix. I modified FunctionNodeVisitor (with FunctionNodeExpressionVisitor) to visit all the nodes up to the leaf nodes in the single pass.
I found this more natural approach than grabbing vy_ast.Attribute descendent nodes and validating certain properties.
I might be missing some reasons why it hasn't implemented in this way, so I would appreciate if you could review this change for potential drawbacks.
Thanks a lot!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your approach is natural and makes sense. It probably wasn't implemented that way because we didn't really have context-dependent checks like that. The only thing I would ask is to change the name of the class to indicate that it is private to this module - maybe _LocalExprVisitor

@hi-ogawa
Copy link
Contributor Author

hi-ogawa commented Jan 4, 2022

@charles-cooper @fubuloubu Thanks for the feedback! I'll address your comments as soon as I can.

The bounds check is used for safety reasons.

May I ask the detail of this "safety reasons"? Probably I'm missing some intuition and use cases, but I cannot see what could be "unsafe" when users get zero-filled slice.

@fubuloubu
Copy link
Member

May I ask the detail of this "safety reasons"? Probably I'm missing some intuition and use cases, but I cannot see what could be "unsafe" when users get zero-filled slice.

Interpreting filler bytes as "true zero" is a frequent source of errors

@hi-ogawa
Copy link
Contributor Author

hi-ogawa commented Jan 4, 2022

Interpreting filler bytes as "true zero" is a frequent source of errors

Ah, that indeed makes sense to me. Thanks for the clarification!

vyper/semantics/namespace.py Show resolved Hide resolved
Copy link
Member

@charles-cooper charles-cooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hi-ogawa I think this is almost ready to merge. Please make the name change for FunctionNodeExpressionVisitor.

Also I think for clarity in the LLL (since the introduced opcodes are a little ad-hoc to begin with), we should use ~extcode_slice, ~selfcode_slice and ~calldata_slice. (Sorry I didn't suggest it before, I had to think about it for awhile before coming to the conclusion). Would you be willing to make this change for the msg.data slice code as well?

@hi-ogawa
Copy link
Contributor Author

hi-ogawa commented Jan 9, 2022

@charles-cooper Thanks for the suggestions! I agree that ~extcode is a bit hacky and it should be more explicit as you suggests. I will work on it and let you know when it's ready.

By the way, I was wondering but forgot about this. For these uses of adhoc lll opcodes (~extcode_slice, etc...), should they be included in VALID_LLL_MACROS? It seems to affect the behavior of LLLnode.cache_when_complex, but I couldn't tell which is preferred (include or not include) for "adhoc" lll opcodes.
Do you think it should be included there?

@charles-cooper
Copy link
Member

charles-cooper commented Jan 9, 2022

By the way, I was wondering but forgot about this. For these uses of adhoc lll opcodes (~extcode_slice, etc...), should they be included in VALID_LLL_MACROS? It seems to affect the behavior of LLLnode.cache_when_complex, but I couldn't tell which is preferred (include or not include) for "adhoc" lll opcodes. Do you think it should be included there?

Yes - they should be included in VALID_LLL_MACROS

Comment on lines +197 to +199
"~extcode",
"~selfcode",
"~calldata",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charles-cooper These are not exactly suggested in #2583 (review), but I chose these since I noticed that msg.data could be also used as len(msg.data), so ~calldata_slice and ~calldata_len will be needed, which would required to be distinguished in Expr.parse_Attribute by looking up its parent node etc...
I thought such process would be unnecessary complication.
What do you think about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charles-cooper These are not exactly suggested in #2583 (review), but I chose these since I noticed that msg.data could be also used as len(msg.data), so ~calldata_slice and ~calldata_len will be needed, which would required to be distinguished in Expr.parse_Attribute by looking up its parent node etc... I thought such process would be unnecessary complication. What do you think about this?

Right I saw that! I think your reasoning is good.

elif self.expr.attr == "code":
addr = Expr.parse_value_expr(self.expr.value, self.context)
if is_base_type(addr.typ, "address"):
# These adhoc nodes will be replaced with a valid node in `Slice.build_LLL`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice comment

arg = Expr(node.args[0], context).lll_node
if arg.value == "~calldata":
return LLLnode.from_list(["calldatasize"], typ="uint256")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we are here should we add len(<address>.code) too? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind -- @skellet0r points out we already have <address>.codesize!

Copy link
Member

@charles-cooper charles-cooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work. thank you

@charles-cooper charles-cooper merged commit 3a9a342 into vyperlang:master Jan 9, 2022
@hi-ogawa hi-ogawa deleted the 2427-feat-address-code-attribute branch January 9, 2022 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VIP: Accessing deployed byte code
4 participants