Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc: add Toy chapter 1 python code, examples and notebook #354

Merged
merged 27 commits into from Jan 24, 2023

Conversation

superlopuh
Copy link
Member

This is the first installment of the Toy tutorial saga, and the one that least depends on xDSL. The next chapter will include xDSL IR generation

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@superlopuh
Copy link
Member Author

Most of the code is as similar as makes sense to the MLIR implementation of Toy frontend

@codecov
Copy link

codecov bot commented Jan 18, 2023

Codecov Report

Base: 88.49% // Head: 87.91% // Decreases project coverage by -0.58% ⚠️

Coverage data is based on head (0787e65) compared to base (2846e92).
Patch coverage: 76.42% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #354      +/-   ##
==========================================
- Coverage   88.49%   87.91%   -0.58%     
==========================================
  Files          64       70       +6     
  Lines        7847     8546     +699     
  Branches     1285     1364      +79     
==========================================
+ Hits         6944     7513     +569     
- Misses        645      756     +111     
- Partials      258      277      +19     
Impacted Files Coverage Δ
docs/Toy/toy/toy_ast.py 64.11% <64.11%> (ø)
docs/Toy/toy/parser.py 79.73% <79.73%> (ø)
docs/Toy/toy/lexer.py 88.75% <88.75%> (ø)
docs/Toy/toy/location.py 100.00% <100.00%> (ø)
docs/Toy/toy/tests/test_toy.py 100.00% <100.00%> (ø)
xdsl/utils/exceptions.py 68.62% <0.00%> (-23.69%) ⬇️
tests/test_parser_error.py 93.02% <0.00%> (-6.98%) ⬇️
xdsl/xdsl_opt_main.py 87.95% <0.00%> (-0.49%) ⬇️
xdsl/parser.py 83.60% <0.00%> (-0.09%) ⬇️
xdsl/ir.py 84.69% <0.00%> (-0.06%) ⬇️
... and 18 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@webmiche webmiche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I can't help but feel that this really is an MLIR tutorial as I feel like they could not have taken an example that needs more prior knowledge than that, but I guess it makes sense with the infrastructure as MLIR is very focused on tensors and stuff...

# CHECK-NEXT: Params: []
# CHECK-NEXT: Block {
# CHECK-NEXT: VarDecl a<> @{{.*}}ast.toy:11:3
# CHECK-NEXT: Literal: <2, 3>[ <3>[ 1.000000e+00, 2.000000e+00, 3.000000e+00], <3>[ 4.000000e+00, 5.000000e+00, 6.000000e+00]] @{{.*}}ast.toy:11:11
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure how good/consistent floating point prints work in Python. I know there can be issues with comparisons with floats generally, so maybe printing can be dangerous as well.

I think here it makes sense as this is literally a cast from 1 to a float print of 1, but this is something we have to keep in mind for the future, especially when checking for floats that had arithmetic performed on them.

@@ -0,0 +1,31 @@
# RUN: toyc-ch2 %s -emit=mlir 2>&1 | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already chapter 2, right? Is there a reason why this is contained in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added all the toy examples, and didn't bother removing any of the comments, it's not like we're running filecheck on them. Happy to remove the comments if you think they might be confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we should right? Can we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We technically can't right now because we don't match the MLIR syntax exactly, even in MLIR mode on the printer. I think it would be a nice goal, but a bit orthogonal to this PR

TokenT = TypeVar('TokenT', bound=Token)


class Parser:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe somehow unify this with our parser infrastructure? Or maybe even autogenerate at some point (@AntonLydike). I guess not for this PR though :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how helpful that would be, this is a sort of "toy" parser, that we don't need to be particularly efficient or good to work on, more for didactic purposes. Might be a bit of a distraction for people to try to understand the parser generation infrastructure instead of xDSL itself

Copy link
Collaborator

@PapyChacal PapyChacal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@superlopuh superlopuh added this to the Convolve 2023-02 milestone Jan 19, 2023
@superlopuh superlopuh mentioned this pull request Jan 19, 2023
Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set up proper testing for this PR.
Also, what about placing the implementation out of docs?

docs/Toy/examples/ast.toy Show resolved Hide resolved
Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that not all files of this PR are necessary to run this notebook.
Is that right? If so can we incrementally add only the necessary things, in every one of the forthcoming PRs?

Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a gentle reminder that docs/ folder is not under codecov!

@superlopuh
Copy link
Member Author

Which files aren't necessary? Can't codecov what we don't test ;)

@georgebisbas
Copy link
Contributor

Which files aren't necessary? Can't codecov what we don't test ;)

This line here: https://github.com/xdslproject/xdsl/blob/main/.coveragerc#L8
contains what is currently under codecov.

Are all the files in docs/Toy used/covered in this PR?

@superlopuh
Copy link
Member Author

I'm not sure what the benefit would be of adding the folder to codecov, if it's not covered by any unit tests, which we probably don't want, at least in the short term.

Copy link
Collaborator

@math-fehr math-fehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I added minor comments.
You are often using toy-ch1 or toy-ch2, where are these defined?

@@ -39,3 +39,4 @@ jobs:
run: |
pytest --nbval-lax docs/irdl.ipynb --maxfail 1 -vv
pytest --nbval docs/tutorial.ipynb --maxfail 1 -vv
pytest --nbval docs/Toy/Toy_Ch1.ipynb -vv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have --maxfail 1, does that mean that we could potentially break toy without seeing it on the CI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that how maxfail works? I thought it was that we wouldn't see it on the CI if there were 0 or 1 failures

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all. maxfail just stops testing at the first failure. By default testing stops at 5 failures. It can be dropped

"## The Language\n",
"\n",
"This tutorial will be illustrated with a toy language that we’ll call “Toy”\n",
"(naming is hard...). Toy is a tensor-based language that allows you to define\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do agree with the joke, I'm not sure about keeping it here :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text is almost entirely lifted word-for-word from the page. We could also write our own tutorial, which would probably be a bit safer from the copyright perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add a citation at the last cell to refer/cite to the page

"to the LLVM Kaleidoscope equivalent that are detailed in the first two chapters of the\n",
"[Kaleidoscope Tutorial](https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl02.html).\n",
"\n",
"The next chapter will demonstrate how to convert this AST into MLIR."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The next chapter will demonstrate how to convert this AST into MLIR."
"The next chapter will demonstrate how to convert this AST into xDSL."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I'm not sure which is correct, since it's going to MLIR text representation, through xDSL the framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert this AST, first to xDSL and next to MLIR ?.

docs/Toy/Toy_Ch1.ipynb Outdated Show resolved Hide resolved
docs/Toy/toy/ast.py Outdated Show resolved Hide resolved
self.loc = loc
print(self.dump())

@property
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this an abstract class? Or is this class instantiated at some point without being a child class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you make it abstract?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You inherit from ABC :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that was for isinstance dark magic. Does it do things like prevent instantiation? I guess we could make this class an abstract class, not sure if it's worth doing


@dataclass
class VarType:
'A variable type with shape information.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use """ everywhere:
https://peps.python.org/pep-0257/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Convert to using """

@dataclass
class PrototypeAST:
'''
This class represents the "prototype" for a function, which captures its
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably say in the comments that this is a function declaration.
I guess the name prototype is only used in the C/C++ world? (though I may be wrong)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments are also lifted word-for-word from the MLIR tutorial. In my mind keeping it almost identical was a win, but you're right that as a standalone tutorial it's not made stronger by using some of the cpp concepts

@@ -0,0 +1,76 @@
# RUN: toyc-ch1 %s -emit=ast 2>&1 | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are using toyc-ch1 and toyc-ch2, however no such file exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't really use them, I just copied the files including this comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note that they are not used for future reference?

self.tokens = tokenize(file, program)
self.pos = 0

def getToken(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document the few next functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • add documentation for getToken and token precedence functions

Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are one step before merging. Great work. Left a few minor comments, I am fine if they addressed at some other PR..

"## The Language\n",
"\n",
"This tutorial will be illustrated with a toy language that we’ll call “Toy”\n",
"(naming is hard...). Toy is a tensor-based language that allows you to define\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can add a citation at the last cell to refer/cite to the page

"to the LLVM Kaleidoscope equivalent that are detailed in the first two chapters of the\n",
"[Kaleidoscope Tutorial](https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangImpl02.html).\n",
"\n",
"The next chapter will demonstrate how to convert this AST into MLIR."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert this AST, first to xDSL and next to MLIR ?.

@@ -0,0 +1,76 @@
# RUN: toyc-ch1 %s -emit=ast 2>&1 | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note that they are not used for future reference?

@@ -0,0 +1,120 @@
from dataclasses import dataclass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import re

from .location import Location

from dataclasses import dataclass
from pathlib import Path
from typing import List

# 1-indexed
col += 1
if char == '#':
# Comment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, although I see where you're coming from :) Comments start with a # in Toy like in Python, which makes this comment look a bit weird

return self.tokens[self.pos]

def getTokenPrecedence(self) -> int:
'''Returns precedence if the current token is a binary operation, -1 otherwise'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don;t we want """ quotes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we? I guess I'm used to single quotes, but consistency is best. It looks like I'm the only one who's added triple single quotes in the whole repo, I'll undo that in this one, and a followup for the code that I've already checked in


def pop(self) -> Token:
self.peek()
self.pos += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do not we just do return self.tokens[self.pos] ? Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I need to change pos before returning it. I could do something like res = self.tokens[self.pos]; self.pos += 1; return res but I prefer how I wrote it in the code

docs/Toy/toy/parser.py Show resolved Hide resolved

def test_parse_ast():
ast_toy = Path() / 'docs' / 'Toy' / 'examples' / 'ast.toy'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can;t we do:

dir_to_scan = "......xxx.x.x.x.x.x.x..examples/ast_toy"
ast_toy = Path(dir_to_scan)

as in: https://pbpython.com/pathlib-intro.html

from pathlib import Path

dir_to_scan = "/media/chris/KINGSTON/data_analysis"
p = Path(dir_to_scan)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could if you prefer it

@superlopuh superlopuh merged commit 2d69772 into main Jan 24, 2023
@superlopuh superlopuh deleted the sasha/toy-ch1 branch January 24, 2023 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants