Skip to content

Add a utility for principal types #7642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Add a utility for principal types #7642

wants to merge 8 commits into from

Conversation

tlively
Copy link
Member

@tlively tlively commented Jun 6, 2025

WebAssembly has the property that every valid sequence of instructions
has a principal type, i.e. a unique most-specific type. However, due to
various kinds of polymorphism, these principal types may contain type
variables. Specifically, the syntaxes for value types, heap types,
nullability, sharedness, and exactness are all augmented with type
variables.

Add a PrincipalType utility for representing and composing these
principal types. This can be seen as a more powerful version of the
existing StackSignature utility. Follow-up PRs will update ChildTyper to
use principal types to represent type constraints and use the principal
types of outlined instruction sequences to detect polymorphism and
determine the types of outlined functions. There are also further
potential applications in simplifying validation, simplifying expression
typing, and performing spec-compliant validation.

WebAssembly has the property that every valid sequence of instructions
has a principal type, i.e. a unique most-specific type. However, due to
various kinds of polymorphism, these principal types may contain type
variables. Specifically, the syntaxes for value types, heap types,
nullability, sharedness, and exactness are all augmented with type
variables.

Add a PrincipalType utility for representing and composing these
principal types. This can be seen as a more powerful version of the
existing StackSignature utility. Follow-up PRs will update ChildTyper to
use principal types to represent type constraints and use the principal
types of outlined instruction sequences to detect polymorphism and
determine the types of outlined functions. There are also further
potential applications in simplifying validation, simplifying expression
typing, and performing spec-compliant validation.
@tlively tlively requested review from kripken and ashleynh June 6, 2025 18:16
const auto* null = std::get_if<Nullability>(&ref.null);
if (!null) {
return ref;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helpers might benefit from comments, as I feel I'm missing the point. What sense of "canonicalization" is meant here? Is the canonical representation defined somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added comments for all the helpers. It might be helpful to start reading the implementation of PrincipalType::compose at the end of the file to get a sense for all the helpers are used, too.

return VarRef{null, VarDefHeapType{type.getExactness(), ht}};
}

// join: Update `value` to the least upper bound of `value` and `other`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// join: Update `value` to the least upper bound of `value` and `other`.
// join: Update `value` to the least upper bound of `value` and `other`. Returns
// true if a change was made.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or does the return value have some other meaning? Actually after reading a bit further down, I am starting to think it does.

};

void countVar(Index* countp, Index i) {
// Variables must be introduced sequentialy starting at 0 so their count is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Variables must be introduced sequentialy starting at 0 so their count is
// Variables must be introduced sequentially starting at 0 so their count is

// variable. If a reference type can be expressed in the normal syntax rather
// than the extended syntax, it should be represented here as a `Type`, not a
// `VarRef`.
using VarType = std::variant<Type, VarRef, Index>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read about half of the cpp and think I have some feel for what's going on here. But I have a question about the design. Rather than VarType, VarRef, VarHeapType, VarDefHeapType, VarAbsHeapType, could we have a single class:

struct VarStar {
  std::optional<VarNullability> null;
  std::optional<VarExactness> exact;
  std::optional<VarShare> share;
  std::optional<VarHeapType> heap;  
};

Not all those 4 fields would be needed in all cases, but there would be a natural/canonical way to represent things in each situation. That is, those 4 attributes are all we can track: nullability, exactness, shareability, and heap type, and each of them can be present or absent, and can also be a variable.

I feel like a single class could avoid much of the code in the cpp file? E.g. joining would just join each of the 4 attributes. Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually started out with a design that was similar to your suggestion. The problem I saw is that there are many nonsensical configurations to ignore and that the different members cannot be handled independently. For example, it is impossible to correctly join exactness without also considering heap types, and there is double-bookkeeping between the explicit sharedness and the sharedness that is part of a heap type. Having several types that more precisely model the allowed syntax lets us break what would have been one very large and messy function into several much smaller and simpler functions that only consider one case each. Having just one type holding everything would reduce the number of functions, but not the overall amount of code, and it would make the code much more complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense. Good to know you considered that option.

return true;
}
// Make sure we have a class for i.
classes.info.reserve(i + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK no previous user of DisjointSets uses the internal .info in this manner. Is the API there not sufficient? (if not should we extend it?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess we could add a reserve method that just calls reserve on the underlying vector. I'll do that.

}
}

// matchBottom: join the bottom value into the assignemnt in `assignments` for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// matchBottom: join the bottom value into the assignemnt in `assignments` for
// matchBottom: join the bottom value into the assignment in `assignments` for


// matchBottom: join the bottom value into the assignemnt in `assignments` for
// any variables appearing in the second parameter. Returns `true` if this
// succeeds. TODO: This currently always succeeded, but if we supported repeated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// succeeds. TODO: This currently always succeeded, but if we supported repeated
// succeeds. TODO: This currently always succeeds, but if we supported repeated

return matchBottom(assignments, ref.null) && matchBottom(assignments, ref.ht);
}

// match: Record the variable assignemnts necessary to make `a` match `b` (i.e.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// match: Record the variable assignemnts necessary to make `a` match `b` (i.e.
// match: Record the variable assignments necessary to make `a` match `b` (i.e.

// variables in outputs properly, we would need to avoid mixing left assignment
// and right assignemnt, so this would be able to fail.

bool matchBottom(VarAssignments& assignments, const VarNullability& null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused as to where the word "match" comes from, here and below. Here, can this not be assignBottom? Looks like that is what it does, assign the bottom type if the input is an index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches is the word the spec uses for "is a subtype of," so here I'm using match for "make into a subtype of."

}

// match: Record the variable assignemnts necessary to make `a` match `b` (i.e.
// to ensure `a` <: `b`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing the above comment about the word "match", can we say ensureSub, as that IIUC what the comment describes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's value in matching the spec wording, especially for something that is so directly taken from the spec, and also when the spec wording so conveniently pithy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants