New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc: add Toy chapter 1 python code, examples and notebook #354
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Most of the code is as similar as makes sense to the MLIR implementation of Toy frontend |
Codecov ReportBase: 88.49% // Head: 87.91% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #354 +/- ##
==========================================
- Coverage 88.49% 87.91% -0.58%
==========================================
Files 64 70 +6
Lines 7847 8546 +699
Branches 1285 1364 +79
==========================================
+ Hits 6944 7513 +569
- Misses 645 756 +111
- Partials 258 277 +19
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I can't help but feel that this really is an MLIR tutorial as I feel like they could not have taken an example that needs more prior knowledge than that, but I guess it makes sense with the infrastructure as MLIR is very focused on tensors and stuff...
docs/Toy Tutorial/examples/ast.toy
Outdated
# CHECK-NEXT: Params: [] | ||
# CHECK-NEXT: Block { | ||
# CHECK-NEXT: VarDecl a<> @{{.*}}ast.toy:11:3 | ||
# CHECK-NEXT: Literal: <2, 3>[ <3>[ 1.000000e+00, 2.000000e+00, 3.000000e+00], <3>[ 4.000000e+00, 5.000000e+00, 6.000000e+00]] @{{.*}}ast.toy:11:11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure how good/consistent floating point prints work in Python. I know there can be issues with comparisons with floats generally, so maybe printing can be dangerous as well.
I think here it makes sense as this is literally a cast from 1
to a float print of 1, but this is something we have to keep in mind for the future, especially when checking for floats that had arithmetic performed on them.
@@ -0,0 +1,31 @@ | |||
# RUN: toyc-ch2 %s -emit=mlir 2>&1 | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already chapter 2, right? Is there a reason why this is contained in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added all the toy examples, and didn't bother removing any of the comments, it's not like we're running filecheck on them. Happy to remove the comments if you think they might be confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we should right? Can we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We technically can't right now because we don't match the MLIR syntax exactly, even in MLIR mode on the printer. I think it would be a nice goal, but a bit orthogonal to this PR
docs/Toy Tutorial/toy/parser.py
Outdated
TokenT = TypeVar('TokenT', bound=Token) | ||
|
||
|
||
class Parser: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maybe somehow unify this with our parser infrastructure? Or maybe even autogenerate at some point (@AntonLydike). I guess not for this PR though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how helpful that would be, this is a sort of "toy" parser, that we don't need to be particularly efficient or good to work on, more for didactic purposes. Might be a bit of a distraction for people to try to understand the parser generation infrastructure instead of xDSL itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to set up proper testing for this PR.
Also, what about placing the implementation out of docs?
This reverts commit b75fcd0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that not all files of this PR are necessary to run this notebook.
Is that right? If so can we incrementally add only the necessary things, in every one of the forthcoming PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a gentle reminder that docs/ folder is not under codecov!
Which files aren't necessary? Can't codecov what we don't test ;) |
This line here: https://github.com/xdslproject/xdsl/blob/main/.coveragerc#L8 Are all the files in docs/Toy used/covered in this PR? |
I'm not sure what the benefit would be of adding the folder to codecov, if it's not covered by any unit tests, which we probably don't want, at least in the short term. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I added minor comments.
You are often using toy-ch1
or toy-ch2
, where are these defined?
@@ -39,3 +39,4 @@ jobs: | |||
run: | | |||
pytest --nbval-lax docs/irdl.ipynb --maxfail 1 -vv | |||
pytest --nbval docs/tutorial.ipynb --maxfail 1 -vv | |||
pytest --nbval docs/Toy/Toy_Ch1.ipynb -vv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't have --maxfail 1, does that mean that we could potentially break toy without seeing it on the CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that how maxfail works? I thought it was that we wouldn't see it on the CI if there were 0 or 1 failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all. maxfail just stops testing at the first failure. By default testing stops at 5 failures. It can be dropped
"## The Language\n", | ||
"\n", | ||
"This tutorial will be illustrated with a toy language that we’ll call “Toy”\n", | ||
"(naming is hard...). Toy is a tensor-based language that allows you to define\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I do agree with the joke, I'm not sure about keeping it here :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text is almost entirely lifted word-for-word from the page. We could also write our own tutorial, which would probably be a bit safer from the copyright perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can add a citation at the last cell to refer/cite to the page
"to the LLVM Kaleidoscope equivalent that are detailed in the first two chapters of the\n", | ||
"[Kaleidoscope Tutorial](https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl02.html).\n", | ||
"\n", | ||
"The next chapter will demonstrate how to convert this AST into MLIR." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The next chapter will demonstrate how to convert this AST into MLIR." | |
"The next chapter will demonstrate how to convert this AST into xDSL." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I'm not sure which is correct, since it's going to MLIR text representation, through xDSL the framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert this AST, first to xDSL and next to MLIR ?.
docs/Toy/toy/ast.py
Outdated
self.loc = loc | ||
print(self.dump()) | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this an abstract class? Or is this class instantiated at some point without being a child class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you make it abstract?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You inherit from ABC :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that was for isinstance dark magic. Does it do things like prevent instantiation? I guess we could make this class an abstract class, not sure if it's worth doing
docs/Toy/toy/ast.py
Outdated
|
||
@dataclass | ||
class VarType: | ||
'A variable type with shape information.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use """
everywhere:
https://peps.python.org/pep-0257/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Convert to using """
docs/Toy/toy/ast.py
Outdated
@dataclass | ||
class PrototypeAST: | ||
''' | ||
This class represents the "prototype" for a function, which captures its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably say in the comments that this is a function declaration.
I guess the name prototype is only used in the C/C++ world? (though I may be wrong)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments are also lifted word-for-word from the MLIR tutorial. In my mind keeping it almost identical was a win, but you're right that as a standalone tutorial it's not made stronger by using some of the cpp concepts
@@ -0,0 +1,76 @@ | |||
# RUN: toyc-ch1 %s -emit=ast 2>&1 | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are using toyc-ch1
and toyc-ch2
, however no such file exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't really use them, I just copied the files including this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note that they are not used for future reference?
self.tokens = tokenize(file, program) | ||
self.pos = 0 | ||
|
||
def getToken(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document the few next functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- add documentation for
getToken
and token precedence functions
Co-authored-by: Fehr Mathieu <mathieu.fehr@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are one step before merging. Great work. Left a few minor comments, I am fine if they addressed at some other PR..
"## The Language\n", | ||
"\n", | ||
"This tutorial will be illustrated with a toy language that we’ll call “Toy”\n", | ||
"(naming is hard...). Toy is a tensor-based language that allows you to define\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can add a citation at the last cell to refer/cite to the page
"to the LLVM Kaleidoscope equivalent that are detailed in the first two chapters of the\n", | ||
"[Kaleidoscope Tutorial](https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl02.html).\n", | ||
"\n", | ||
"The next chapter will demonstrate how to convert this AST into MLIR." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert this AST, first to xDSL and next to MLIR ?.
@@ -0,0 +1,76 @@ | |||
# RUN: toyc-ch1 %s -emit=ast 2>&1 | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note that they are not used for future reference?
@@ -0,0 +1,120 @@ | |||
from dataclasses import dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import re
from .location import Location
from dataclasses import dataclass
from pathlib import Path
from typing import List
# 1-indexed | ||
col += 1 | ||
if char == '#': | ||
# Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, although I see where you're coming from :) Comments start with a #
in Toy like in Python, which makes this comment look a bit weird
docs/Toy/toy/parser.py
Outdated
return self.tokens[self.pos] | ||
|
||
def getTokenPrecedence(self) -> int: | ||
'''Returns precedence if the current token is a binary operation, -1 otherwise''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don;t we want """ quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we? I guess I'm used to single quotes, but consistency is best. It looks like I'm the only one who's added triple single quotes in the whole repo, I'll undo that in this one, and a followup for the code that I've already checked in
|
||
def pop(self) -> Token: | ||
self.peek() | ||
self.pos += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do not we just do return self.tokens[self.pos] ? Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I need to change pos before returning it. I could do something like res = self.tokens[self.pos]; self.pos += 1; return res
but I prefer how I wrote it in the code
|
||
def test_parse_ast(): | ||
ast_toy = Path() / 'docs' / 'Toy' / 'examples' / 'ast.toy' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can;t we do:
dir_to_scan = "......xxx.x.x.x.x.x.x..examples/ast_toy"
ast_toy = Path(dir_to_scan)
as in: https://pbpython.com/pathlib-intro.html
from pathlib import Path
dir_to_scan = "/media/chris/KINGSTON/data_analysis"
p = Path(dir_to_scan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could if you prefer it
This is the first installment of the Toy tutorial saga, and the one that least depends on xDSL. The next chapter will include xDSL IR generation