Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overloading constructors: who is right? #502

Closed
PaulKlint opened this issue Feb 16, 2014 · 26 comments
Closed

Overloading constructors: who is right? #502

PaulKlint opened this issue Feb 16, 2014 · 26 comments

Comments

@PaulKlint
Copy link
Member

Consider the overloading of int as constructor in Type and constructor in the syntax rule below. The type checker forbids this, the evaluator allows it. Who is right? @jurgenvinju @tvdstorm @mahills @Anastassija opinions?

[Note there are quite some syntax definitions that depend on this kind of overloading so it would be nice if we can allow it]:

module experiments::Compiler::Examples::Tst

import Type;

lexical Num = \int: [0-9]+;

gives [@mahills : I added a print statement and made the error message more informative]

addProduction: prod(label("int",lex("Num")),[iter(\char-class([range(48,57)]))],{})
|rascal://lang::rascal::types::CheckTypes|(44044,73,<873,58>,<873,131>): "Invalid addition: cannot add production RSimpleName(\"int\") into scope, it clashes with non-constructor variable or function names"
@PaulKlint PaulKlint added this to the Rascal compiler milestone Feb 16, 2014
@jurgenvinju
Copy link
Member

I believe it should be allowed, since we allow this overloading between two datatypes as well. If ambiguous at the call site, qualified names are required. —
Jurgen J. Vinju
CWI SWAT
INRIA Lille
UvA master software engineering
http://jurgen.vinju.org

On Sun, Feb 16, 2014 at 9:47 PM, Paul Klint notifications@github.com
wrote:

Consider the overloading of int as constructor in Type and constructor in the syntax rule below. The type checker forbids this, the evaluator allows it. Who is right? @jurgenvinju @tvdstorm @mahills @Anastassija opinions?
[Note there are quite some syntax definitions that depend on this kind of overloading so it would be nice if we can allow it]:

module experiments::Compiler::Examples::Tst
import Type;
lexical Num = \int: [0-9]+;

gives [@mahills : I added a print statement and made the error message more informative]

addProduction: prod(label("int",lex("Num")),[iter(\char-class([range(48,57)]))],{})
|rascal://lang::rascal::types::CheckTypes|(44044,73,<873,58>,<873,131>): "Invalid addition: cannot add production RSimpleName(\"int\") into scope, it clashes with non-constructor variable or function names"

Reply to this email directly or view it on GitHub:
#502

@mahills
Copy link
Member

mahills commented Feb 17, 2014

The main challenge with this is the following: say we have the case given above. If we see a use of \int, we currently can always assign it a type. If we allow this, that will no longer be possible. At the same time, I guess we already have this problem with constructors (if you reuse a name between different types), so maybe this isn't a big deal, we just need the overload to include the various ADT or nonterminal types that could be assigned to that name. It looks like I'll need to make the name introduction rules more forgiving and the name use rules more involved for these types of cases.

@mahills
Copy link
Member

mahills commented Aug 14, 2014

I definitely need to address this now, since this pops up all over the place when checking the type checker itself (since we import the Rascal grammar). This is still the same issue, though -- if we just see something like \int(), we need to use the context to determine whether that particular use is a constructor or a production/concrete syntax type so we can assign the proper type to it. In some cases this will be straight-forward (e.g., we are returning it, and we know the return type), while in others it will be quite challenging (we assign it to a variable of inferred type).

@PaulKlint
Copy link
Member Author

I find it hard to asess the pros and cons of the alternatives:

  • from a user perspective we probably want to allows this, but the price is added complexity in the type checker.
  • if we forbid this overloading, we restrict the user but simplify the typechecker.

So there are really two questions:

  • What are the complications in the type checker if we select the first option?
  • How bad would it be if we select the second option? How inconvenient will this be? Is it consistent?

@mahills
Copy link
Member

mahills commented Aug 14, 2014

I think the main issue right now is not with type names, but with constructors for abstract versus concrete types. The behavior right now is not to import a name if this will cause a clash -- so, if there is a constructor named int, and a production named int, both imported, the name isn't brought into scope (although it is still available in its fully qualified form). This is confusing, since you think "but I imported it!", and may not know why it was not imported.

What I can do is keep a set of these around to make better warning messages, which at least would help, and I guess we could think of adding a quick fix later that would allow users to just repair it rapidly. Another option would be to not import production names by default -- I think they are not used as often as constructor names, so this would eliminate many of the clashes unless you really needed the production names to be available.

To address your specific questions:

  • the complication is that we have to pass around an "undecided" type containing various options, which may never get properly resolved, in which case we would have terms in the program that do not have a defined type. We would also have to add support for "resolving" these in a number of instances, such as when passed to another function (even if we start saying #f instead of just f, as we discussed earlier, this is still an issue), when returned, when applied, etc...
  • I think this is consistent, but it is also inconvenient, especially in programs like the checker, custom imploders, etc that make use of concrete syntax definitions which may clash on certain constructors but not on others. The main problem here is that the current interpreter allows this, since it can just resolve it dynamically in most cases.

Another option would be to not check on names, but on the entire signature, so you could have an int with arity 0 and an int with arity 1. Assuming we don't add varargs constructors, this would help, but would still be an issue if we just use the name without giving arguments (and still wouldn't help in this specific case, since the version of int defined as a Symbol and the version of int defined as a BasicType both have arity 0).

@PaulKlint
Copy link
Member Author

Great analysis! Maybe we should first try to answer the question: who needs these production names anyway?

The only answers I can come up with are (but maybe I am missing something?):

  • implode
  • meta programs that manipulate Rascal code.

I guess that for implode the production names need not be visible in the importing module since it gets them via the grammar (in the interpreter from the module environment in the compiler via the reified type).

For meta-programs the same holds. _This would imply that we don't need to import the production names at all._ Can anybody confirm/refute this reasoning?

@DavyLandman
Copy link
Member

Are you talking specifically for the Rascal grammar's production names?

Else, I think production names are usefull: if you want to match on a parse tree without using concrete syntax. Or if you would want to use a non terminal from another grammar and use the Statement!jump!goto feature to remove some productions from it?

@jurgenvinju
Copy link
Member

@PaulKlint we do need these names to avoid writing complex concrete patterns when a simple test will do as in myStat is if and if(_,_,_) := myStat as opposed to (Statement) if { <Expression e> } <Statement* _> else { <Statement* _> } := myStat. I don't think Rascal meta programs are a special case in that sense. We needed them first for Rascal meta programs to get ourselves out of a nasty bootstrap situation, but they are needed in all programs manipulating syntax trees.

Two solutions to this problem:

  • solving: have a different notation for constructor names from parse trees: If(_,_) instead of if. We could let parse tree constructor names start with capital letters and abstract constructor names with lowercase. Problem solved for the constructor names, which leaves a solution wanting for the type names; something like syntax[Expression] instead of Expression could be used to disambiguate type names. We might make this optional.
  • mitigating: do not have a mirror AST definition for a grammar, instead have full pattern matching and construction capabilities with abstract syntax on concrete trees (already a feature request for a long time)

I am for having both solutions.

@jurgenvinju
Copy link
Member

@DavyLandman that Statement!jump notation is definitely also undoable without production names indeed.

@PaulKlint
Copy link
Member Author

Ok indeed use cases I missed, thanks for mentioning them. The next question is then: do we really need to bring them in scope? Could a library function getProductionNames(Statement) be a solution?

@PaulKlint
Copy link
Member Author

@jurgenvinju case distinction is a solution but maybe it is a bit too subtle?

Indeed we should aim for completely transparent mixing of concrete and abstract matching.

@jurgenvinju
Copy link
Member

No the library function solution looks ugly to me.

Listen we have this same issue for a much simpler case, namely two functions in two modules who happen to have the same name, same amount of arguments, but different return type:

module A
  int f();
module B
  Exp f();
module C
import A;
import B;

void main() {
  println(f());
}

There we ask to disambiguate the function by qualifying it. Why can't we do this for this case as well? At the moment we use f anywhere, the type checker can either resolve it uniquely or not, and if not the programmer should disambiguate. Is a similar thing not doable for constructor/production names?

Sorry for asking stupid questions; its pretty hard to oversee all the implications.

@jurgenvinju
Copy link
Member

@PaulKlint instead of case distinction pick an operator? ^if(_,_) := myStat, myStat is ^if

The case distinction trick is less space and "big letters = big trees, small letters is small trees"

@mahills
Copy link
Member

mahills commented Aug 15, 2014

Hi @jurgenvinju, qualification works fine. The main issue relates more to volume that capability -- with pattern-based dispatch we could have a large number of cases where type names need to be qualified, where this is less likely with function names (unless you just give every function the same name) or type names (which do not appear as often, since they can be left out in most cases other than top-level variables, aliases, function signatures, etc). Also, with production and constructor names, there are a lot more of them, so it is more likely that one will have an inadvertent clash between them.

@PaulKlint
Copy link
Member Author

If I remember correctly, we do not syntactically allow qualified names in patterns.

@mahills
Copy link
Member

mahills commented Aug 15, 2014

From Slack:

Thinking on this more, it may be possible to add them both in at the same time. We do this now with functions, and we just say you need to qualify them if there is an ambiguity. So, in theory, we really should be able to do the same thing with constructors and productions.

@mahills
Copy link
Member

mahills commented Aug 15, 2014

@PaulKlint at least based on the pattern syntax, I think we can use them except as type names inside concrete syntax patterns, where the hole just contains a standard Name instead of a Type or QualifiedName.

@PaulKlint
Copy link
Member Author

Ok I will turn this into a new issue.

@jurgenvinju
Copy link
Member

I believe we've arrived to conclude that this issue is the same as #493 (see discussion in #638 ).

@jurgenvinju
Copy link
Member

So @mahills, if I understand right, right now in case of conflict none of the names are brought in scope, which may result in lots of error messages? Could we still bring them in scope, but explicitly labeled ambiguous in order to make sure we can give good error messages (this is what you suggested in your comment from Slack). Sounds like a great solution and also more consistent with the way we deal with function names.

Even though in the future we may syntactically separate data and syntax types and constructors, it may still happen due to importing modules that we import ambiguously defined constructors, so a solution here will be necessary regardless.

@mahills
Copy link
Member

mahills commented Aug 17, 2014

That is what I will try to do -- earlier we had tentatively decided to just not import them, but still allow them to be accessed with their fully qualified names. I think it should be possible to import them anyway, but will need to put in more code to check for ambiguities. I still think we should disallow declarations of constructors and productions (and of abstract and concrete types) with the same name within the same module, though -- that gives no way to actually disambiguate them, since the fully qualified name could then refer to either.

@PaulKlint
Copy link
Member Author

@mahills this essentially consolidates our current practice: put concrete and abstract syntax in separate modules and this is fine.

However, precisely for educational purposes it is heavy weight and there I would prefer a solution where both can live together in a single module. But let's postpone this until we separate these name spaces in some way.

@jurgenvinju
Copy link
Member

Agreed, within the same module we do not allow ambiguous overloading, not for functions, not for constructors and not for syntax. The only exceptions being default functions can overload non-default functions and constructors or productions may overload functions.

@mahills
Copy link
Member

mahills commented Aug 18, 2014

@jurgenvinju agreed for constructors and syntax -- for functions it is a more complicated story, since we naturally allow a large number of overlapping overloads because of patterns in the signature. This makes it quite challenging to detect an actual overlap. Long term, I need to add pattern coverage checking to see if there are missing or overlapping patterns.

@PaulKlint agreed that it would be nice to have everything in one module, and I also agree that we should wait until the namespaces are separate to do this.

@PaulKlint
Copy link
Member Author

My impression is that this issue is solved.

@mahills
Copy link
Member

mahills commented Jan 13, 2015

I believe so as well -- in general, it will look for a concrete version of the type in cases where a) no unqualified type of that name is in scope, and b) it can tell that we attempted to load the name but it would have caused a collision between concrete and abstract type versions. This should prevent "leaking" concrete type names, since we always keep them around in the background so the compiler can use them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants