Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VIP: stateful singleton modules with ownership hierarchy #3722

Closed
charles-cooper opened this issue Jan 9, 2024 · 9 comments
Closed

VIP: stateful singleton modules with ownership hierarchy #3722

charles-cooper opened this issue Jan 9, 2024 · 9 comments

Comments

@charles-cooper
Copy link
Member

charles-cooper commented Jan 9, 2024

Simple Summary

extend the import system by allowing "stateful modules" (that is, modules with top-level state variables). introduce a constraint system on the import system which maximizes safety + usability.

this is one of two proposals exploring the stateful module design space; the other is #3723.

Motivation

re-using code which encapsulates state is in general a useful feature to have for a language! however, in a contract-oriented programming context, this is a double edged sword because reasoning about storage is fundamentally difficult, especially when storage accesses are hidden behind a layer of abstraction. consider two basic approaches to the problem:

  1. each module gets a singleton instantiation in the storage allocator. this follows the One-Def Rule for modules and is probably the most intuitive for programmers. however, this hurts reasonability because a module's storage could be changed anywhere inside the import graph, as in the following example:
import dep1  # has a storage variable, counter
import dep2  # imports dep1. dep2.bar() modifies counter

@external
def foo():
    dep1.counter += 1
    dep2.bar()  # tramples dep1.counter!

this has a further issue which we will discuss in a bit, which is that access to dep1's __init__() function is uncontrolled. that is, it could be called multiple times in the import graph. this is a correctness problem, because programmers expect constructors to be called at most one time.

  1. the user controls instantiations by explicitly instantiating instances of a module. each of these is a fresh instantiation in the storage allocater. this has multiple benefits. if you instantiate a module, you are guaranteed that nobody else in the import graph can modify it. however, it hurts sharing of global state, which is a design consideration for some use cases. the simplest example of this would be a library which encapsulates re-entrancy protection (note this is a straw-man, because vyper already has a builtin for reentrancy protection).
import Lock  # Lock.acquire() and Lock.release() modify Lock._key
import Foo  # Foo.foo() uses Lock by calling Lock.acquire()/Lock.release()

_lock: Lock  # fresh instance of Lock
_foo: Foo  # fresh instance of Foo

export _foo.foo  # hypothetical syntax, cf. https://github.com/vyperlang/vyper/pull/3698

@external
def bar():
    self._lock.acquire()
    ...
    call SomeContract  # !! can re-enter to Foo.foo, because Foo.Lock._key and Lock._key are separate
    self._lock.release()

the other benefit here would be clear access to imported __init__() functions. since each instantiation is local, it is straightforward to enforce that __init__() is called one time for each instantiation. (in the above example, self._lock.__init__(...) and self._foo.__init__(...) would have to be called in the main __init__() function.

enumerated, the issues brought up above are:

  • state trampling
  • constraints on __init__()
  • state sharing

this proposal proposes a third option, which draws inspiration from linear type systems and the rust borrow checker.

the design proposed here is to enforce the one-def rule, but to address the issues above, additionally introduce an ownership system which allows the compiler to enforce constraints on how module state is written and initialized.

note on a design choice:

  • the new top-level statement type owns: some_module is a design requirement which allows the programmer to control where the module is laid out in storage.

Useful Definitions/Terminology

  • an affine type is one that can be used at most once
  • a linear type is one that must be used exactly once
  • the import graph is a directed acyclic graph which is traversed during import resolution
  • declaring variables produces a compile-time side effect in the storage allocator
  • a "nested import" is an import within an import
  • a "region" is an area of storage which can be touched by an effect
  • a module is a bundle of code-and-storage-layout functionality. there is currently a 1-to-1 correspondence in vyper between files and modules.
  • a compilation target is the module which is passed to the compiler as the "main" module.

Specification

Final Specification.

this proposal introduces an effects hierarchy for interacting with modules: initializes and uses. these correspond to the terminology owns and borrows from linear type systems, respectively.

the basic rules here are:

  1. ownership is modeled as an affine constraint, which is promoted to a linear constraint if any other effects are used from the module. that is,
  • a module might be imported but no stateful functions are accessed, so initialization is allowed but not required.
  • if a stateful function is reachable from the compilation target, then it must be initialized exactly one time in the import graph.
  1. there is a one-to-one correspondence between ownership and initialization. that is, if module initializes module2, then module2.__init__() must be called in module.__init__() . declaring ownership "seals off" access to module2.__init__(). it is envisioned that it will probably be used sparingly or near the top of the import graph.
  2. you cannot touch modules from an __init__() function unless they are already owned.
  3. you cannot touch state from a module unless it is used.
  4. initializes implies uses.
  5. the initializer declaration for a module must include all direct dependencies, e.g. if module1 declares uses: module2, then the initializer for module1 must be declared like initializes: module1[module2 := module2].

Original Specification

for historical/research purposes, the original spec is below. this was the design with seals: but not uses:. this original design is superseded by the design described here: #3722 (comment).

this proposal introduces an effects hierarchy for interacting with modules: owns and seals. an alternative name for owns could be initializes. owns is used here since it is the terminology used in linear type systems.

the basic rules here are:

  1. ownership is modeled as an affine constraint, which is promoted to a linear constraint if any other effects are used from the module. that is,
  • a module might be imported but no stateful functions are accessed, so initialization is allowed but not required.
  • if a stateful function is reachable from the compilation target, then it must be owned exactly one time in the import graph.
  1. there is a one-to-one correspondence between ownership and initialization. that is, if module owns module2, then module2.__init__() must be called in module.__init__() . declaring ownership "seals off" access to module2.__init__(). it is envisioned that it will probably be used sparingly or near the top of the import graph.
  2. you cannot touch modules from an __init__() function unless they are already owned.
  3. if a module seals module2, no other modules can write to it (or directly call mutating functions on module2).
  4. a module can only be owned once. seals: implies ownership.

note that seals: can be considered as an extension to the ownership system. in other words, the seals: semantics is not required to be implemented.

some examples, with a tentative syntax:

import dep1  # has a storage variable, counter
import dep2  # imports dep1. dep2.bar() modifies counter

seals: dep1

def __init__():
    dep1.__init__(...)

@external
def foo():
    dep1.counter += 1

@external
def foo1():
    dep1.update_counter()

# counterfactual example, this does not compile:
@external
def foo2():
    dep1.counter += 1
    dep2.bar()  # not allowed! dep2.bar() modifies dep1
# Bar.vy

import Lock
import Foo

x: uint256

# declare ownership of Lock!
# this would be an error if Foo declared ownership of Lock
# this statement also controls the location of Lock in the storage layout -- it comes after `x`.
owns: Lock  # own, but do not seal lock

exports: Foo.foo

def __init__():
    Lock.__init__(...)  # omitting this would be an error!

@external
def bar():
    Lock.acquire()
    ...  # do stuff, maybe call an external contract
    Lock.release()

an obligatory token example:

###
# Owned.vy
owner: address

def __init__():
    self.owner = msg.sender

def check_owner():
    assert msg.sender == self.owner
###

###
# BaseToken.vy
totalSupply: uint256
balances: HashMap[address, uint256]

def __init__(initial_supply: uint256):
    self.totalSupply += initial_supply
    self.balances[msg.sender] += initial_supply

@external
def transfer(recipient: address, amount: uint256):
    self.balances[msg.sender] -= amount  # safesub
    self.balances[recipient] += amount
###

###
# Mint.vy
import BaseToken
import Owned

@external
def mint(recipient: address, amount: uint256):
    Owned.check_owner()
    self._mint_to(recipient, amount)

@internal
def _mint_to(recipient: address, amount: uint256):
    BaseToken.totalSupply += amount
    BaseToken.balances[recipient] += amount
###

###
# Contract.vy
import Owned
import Mint
import BaseToken

owns: Owned
owns: BaseToken
seals: Mint  # hygiene - seal Mint

def __init__():
    BaseToken.__init__(100)  # required by `owns: BaseToken`
    Owned.__init__()  # required by `owns: Owned`
    Mint.__init__()  # required by `seals: Mint`

export: Mint.mint
export: BaseToken.transfer

note an alternative design for this hypothetical project could be for Mint to own: Owned and be responsible for calling its constructor. then Contract.vy would not be able to own: Owned. this is left as a design choice to library writers, when to "seal" ownership of modules and when to leave them open. for illustration, this is what that design would look like:

# Owned and BaseToken look the same.
###
# Mint.vy
import Owned
import BaseToken

own: Owned
own: BaseToken

def __init__(initial_supply: uint256):
    Owned.__init__()
    BaseToken.__init__(initial_supply)


@external
def mint(recipient: address, amount: uint256):
    Owned.check_owner()
    self._mint_to(recipient, amount)

@internal
def _mint_to(recipient: address, amount: uint256):
    BaseToken.totalSupply += amount
    BaseToken.balances[recipient] += amount
###

###
# Contract.vy
import Mint
import BaseToken

owns: Mint

owns: BaseToken  # this line will raise an error!

def __init__():
    BaseToken.__init__()  # error! Mint already initializes BaseToken
    Owned.__init__()  # error! Mint already initializes Owned

    Mint.__init__(100)  # that's better

Backwards Compatibility

does not change any existing language features, fully backwards compatible

Dependencies

References

Copyright

Copyright and related rights waived via CC0

@pcaversaccio
Copy link
Collaborator

pcaversaccio commented Jan 10, 2024

Having intensively discussed the different tradeoffs with @charles-cooper over the last months I think this proposal is the "better" one of the two (#3723). The "ownership hierarchy" concept is pretty common in languages that deal with memory management and resource allocation (e.g. C, C++, or Rust) and has shown its virtues. I think this can be carried over to contract-oriented programming.

Some open questions:

  • Python doesn't allow for private instance variables. I think the module design could benefit a lot from such a feature. Using a private variable allows encapsulating a module contract in a way that an importing contract can't change a variable in a bad way and the imported contract can expose a getter, or an internal setter if it's ok to modify.
  • Proxies are completely oblivious to the storage trie changes that are performed by the constructor. Upgradability might be a desirable feature (unfortunately) for some use cases, and the __init__ logic using owns or seals can be circumvented since dedicated initialiser functions are usually called. What are your thoughts on this?
  • Should it be possible to remove seals mode for certain specific functions? I.e. a more granular control access structure than completely switching off any writes from other modules.

@charles-cooper
Copy link
Member Author

@DanielSchiavini suggests that ownership should be the default, whereas borrowing needs to be marked

@charles-cooper
Copy link
Member Author

@DanielSchiavini suggests that ownership should be the default, whereas borrowing needs to be marked

as an example:

# library1.vy
import Library2

library2: borrows(Library2)

def __init__(self):
    pass

# contract
import Library1
import Library2

library1: Library1[library2]
library2: Library2

def __init__():
    self.library2.__init__()
    self.library1.__init__()

@charles-cooper
Copy link
Member Author

charles-cooper commented Jan 10, 2024

Having intensively discussed the different tradeoffs with @charles-cooper over the last months I think this proposal is the "better" one of the two (#3723). The "ownership hierarchy" concept is pretty common in languages that deal with memory management and resource allocation (e.g. C, C++, or Rust) and has shown its virtues. I think this can be carried over to contract-oriented programming.

i kind of agree (at least at this moment in time -- the two approaches both have their merits and i have gone back and forth on them many times). typically when people put state in a module, they intend for it to be global. this is especially familiar for people coming from a python background.

put another way, the multiple instantiation paradigm is more elegant, but actually more error prone if you consider the global lock use case. it's too easy to forget to tie two instances together when the library designer intended for a piece of state to be global (which is the default design attitude for somebody who is writing a module).

Some open questions:

  • Python doesn't allow for private instance variables. I think the module design could benefit a lot from such a feature. Using a private variable allows encapsulating a module contract in a way that an importing contract can't change a variable in a bad way and the imported contract can expose a getter, or an internal setter if it's ok to modify.

i think private variable declarations hurt composability. more generally, my current design philosophy is that importers should set constraints, not the importees. this design philosophy maximizes composability.

  • Proxies are completely oblivious to the existence of constructors. Upgradability might be a desirable feature (unfortunately) for some use cases, and the __init__ logic using owns or seals can be circumvented since dedicated initialiser functions are usually called. What are your thoughts on this?

i will need to think about this more, but i think that delegatecall use cases are kind of orthogonal to this proposal.

  • Should it be possible to remove seals mode for certain specific functions? I.e. a more granular control access structure than completely switching off any writes from other modules.

i don't think so. if a library designer or (probably more commonly) a contract author seals another module, i would assume they mean it.
EDIT: i misread this -- i don't think it's particularly useful to seal only specific parts of a module.

@pcaversaccio
Copy link
Collaborator

put another way, the multiple instantiation paradigm is more elegant, but actually more error prone if you consider the global lock use case. it's too easy to forget to tie two instances together when the library designer intended for a piece of state to be global (which is the default design attitude for somebody who is writing a module).

Exactly - generally, it's not a straightforward exercise to immediately understand the implications of multiple instances and stateful actions. Singletons optimise better IMHO for safety & reasonability, which again, is an important design principle for any Vyper feature.

i think private variable declarations hurt composability. more generally, my current design philosophy is that importers should set constraints, not the importees. this design philosophy maximizes composability.

Hmm, I don't think private variables hurt composability. On the contrary, applied correctly it can even be an enabler for better composability without shooting yourself. But I don't wanna abuse this issue for that discussion 😄; so let's stick with the as-is situation for the moment.

@charles-cooper
Copy link
Member Author

charles-cooper commented Jan 13, 2024

@DanielSchiavini suggests that ownership should be the default, whereas borrowing needs to be marked

after ruminating on this for a few days, i favor a system which marks ownership and additionally requires annotation of write dependencies as is proposed in #3723. borrowship may also be marked, although it this is a relatively small detail and can be added or removed in the future. in the following examples, i renamed the keywords owns: and borrows to initializes: and uses:, respectively. so the above example would look like

# Library1.vy
import Library2 as library2

uses: library2

def __init__():
    pass

# contract
import Library1 as lib1
import Library2 as lib2

initializes: lib2
initializes: lib1[library2 := lib2]

def __init__():
    lib2.__init__()
    lib1.__init__()

more formally, the ownership hierarchy as exposed to the user is therefore:

- NO_OWNERSHIP
- WRITES
- INITIALIZES

as an implementation detail, i settled on using the walrus operator (:=) for dependency annotation, since the assignment operator (=) is not allowed inside of brackets.

as a larger example, i wrote up the token example using this syntax here: https://gist.github.com/charles-cooper/fb5caff4eee8bbf92ed86cefaa39a855

@fubuloubu
Copy link
Member

fubuloubu commented Jan 13, 2024

How about require instead of uses (which is a weaker verb and a protected keyword in solidity, ensuring less of a need for it as a state variable name)?

Also, is initializes required when you end up initializing the module anyways?

Your example could look like:

import Library1
import Library2

def __init__():
    Library2()
    Library1(library2=Library2)

extends may also be a nicer word for initializes too

@charles-cooper
Copy link
Member Author

charles-cooper commented Jan 14, 2024

How about require instead of uses (which is a weaker verb and a protected keyword in solidity, ensuring less of a need for it as a state variable name)?

i don't think we need to restrict ourselves to solidity protected keywords, we should rather choose the word which best represents the semantics. the biggest "dent" to UX (if you can call it that) here is that programmers won't be able to have state variables named uses. that said, just to have them all in one place -- the current possibilities/suggestions for the keyword names are:

  • ownership:

    • owns
    • initializes
    • controls
    • extends
  • borrowship:

    • borrows
    • uses
    • requires
    • writes

Also, is initializes required when you end up initializing the module anyways?

Your example could look like:

import Library1
import Library2

def __init__():
    Library2()
    Library1(library2=Library2)

i considered not requiring the initializes: statement. however, the initializes: statement serves two important benefits.

one, it lets the programmer control where the library goes in the storage layout. i think this is important, since the other options are to (somewhat arbitrarily) either choose storage layout order depending on import order, or where the initializations occur in the source code. this way the storage layout is clear from the order in which storage variable declarations and initializes: statements are laid out in the source code.

second, it allows compile-time resolution of Library1's dependencies at the ownership declaration site. that's one reason why i chose the bracket notation -- it looks like a compile-time type- parametrization.

the dependency resolution could be done in source code, but it starts to get weird once source code is not just straight-line, e.g.:

def __init__():
    if block.number % 2 == 0:
        Library2()
        Library1(param1, param2, library2=Library2)
    else:
        Library1(param2, param1, library2=Libraryyy2)  # probably a user typo, need to throw an error even though the first call to Library() is well-formed.
        Library2()  # how does this affect storage layout?

extends may also be a nicer word for initializes too

added to the list above, although i am not really a fan of extends as it has connotations for people coming from OOP (especially Java/C#) background.

charles-cooper added a commit that referenced this issue Feb 10, 2024
this commit implements "singleton modules with ownership hierarchy" as
described in #3722.

to accomplish this, two new language constructs are added: `UsesDecl`
and `InitializesDecl`. these are exposed to the user as `uses:` and
`initializes:`. they are also accompanied by new `AnalysisResult` data
structures: `UsesInfo` and `InitializesInfo`.

`uses` and `initializes` can be thought of as a constraint system on the
module system. a `uses: my-module` annotation is required if
`my_module`'s state is accessed (read or written), and
`initializes: my_module` is required to call `my_module.__init__()`. a
module can be `use`d any number of times; it can only be `initialize`d
once. a module which has been used (directly, or transitively) by the
compilation target (main entry point module), must be `initialize`d
exactly once. `initializes:` is also required to declare which modules
it has been `initialize`d with. for example, if `mod1` declares it
`uses: mod2`, then any `initializes: mod1` statement must declare
*which* instance of `mod2` it has been initialized with. although there
is only ever a single instance of `mod2`, this user-facing requirement
improves readability by forcing the user to be aware of what the state
access dependencies are for a given, `initialize`d module.

the `NamedExpr` node ("walrus operator") has been added to the AST to
support the initializer syntax. (note: the walrus operator is used,
because the originally proposed syntax, `mod1[mod2 = mod2]` is rejected
by the python parser).

a new compiler pass, `vyper/semantics/analysis/global.py` has been
added to implement the global initializer constraint, as it cannot be
defined recursively (without a global context).

since `__init__()` functions can now be called from other `__init__()`
functions (which is not allowed for normal `@external` functions!), a
new `@deploy` visibility has been added to vyper's visibility system.
`@deploy` functions can be called from other `@deploy` functions, and
never from `@external` or `@internal` functions. they also have special
treatment in the ABI relative to other `@external` functions.

`initializes:` is useful since it also serves the purpose of being a
storage allocator directive. wherever `initializes:` is placed, is where
the module will be placed in storage (and code, transient storage, or
any other future storage locations).

this commit refactors the storage allocator so that it recurses into
child modules whenever it sees an `initializes:` statement. it refactors
several data structures surrounding the storage allocator, including
removing inheritance on the `DataPosition` data structure (which has
also been renamed to `VarOffset`). some utility functions have been
added for calculating the size of a given variable, which also get used
in codegen (`get_element_ptr()`).

additional work/refactoring in this commit:
- new analysis machinery for detecting reads/writes for all `ExprInfo`s
- dynamic programming on the `get_expr_info()` routine
- refactoring of `visit_Expr`, which fixes call mutability analysis
- move `StringEnum` back to vyper/utils.py
- remove the "TYPE_DEFINITION" kludge in certain builtins, replace with
  usage of `TYPE_T`
- improve `tag_exceptions()` formatting
- remove `Context.globals`, as we rely on the results of the front-end
  analyser now.
- remove dead variable: `Context.in_assertion`
- refactor `generate_ir_for_function` into
  `generate_ir_for_external_function` and
  `generate_ir_for_internal_function`
- move `get_nonreentrant_lock` to `function_definitions/common.py`
- simplify layout allocation across locations into single function
- add `VyperType.get_size_in()` and `VarInfo.get_size()` helper
  functions so we don't need to do as much switch/case in implementation
  functions
- refactor `codegen/core.py` functions to use `VyperType.get_size()`
- fix interfaces access from `.vyi` files
@charles-cooper
Copy link
Member Author

implemented in #3729

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants