-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: Support hex denseAttr parsing for integers, f32, f64 #1974
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1974 +/- ##
==========================================
+ Coverage 89.83% 89.95% +0.12%
==========================================
Files 284 284
Lines 35531 35649 +118
Branches 5241 5259 +18
==========================================
+ Hits 31918 32069 +151
+ Misses 2840 2822 -18
+ Partials 773 758 -15 ☔ View full report in Codecov by Sentry. |
xdsl/parser/attribute_parser.py
Outdated
# Splat attribute case, same value everywhere | ||
if chunk_size == len(byte_list) and type.get_shape() != (1,): | ||
return data_values, [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@math-fehr @superlopuh @compor this might not be a sufficient check here. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(data_values) == 1
should be a check that is enough?
Any 1x...
dense attribute is a splat anyway
85f5a54
to
de15ff3
Compare
xdsl/parser/attribute_parser.py
Outdated
|
||
if isa(element_type, Float32Type | Float64Type): | ||
for i in range(num_chunks): | ||
parsed_float = struct.unpack( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use unpack_from
or iter_unpack
? feels like we could avoid the iteration in Python, and instead check that the number was what we expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to using unpack_from
, thanks for the suggestion!
xdsl/parser/attribute_parser.py
Outdated
values, shape = [], None | ||
else: | ||
# Expect a tensor literal instead | ||
values, shape = self._parse_tensor_literal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put the str literal logic inside the _parse_tensor_literal
code? Do you know how MLIR structures this parsing logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the parser code of MLIR tbh, I just parsed some denseAttrs in compiler explorer and took a look at what was produced. The MLIR codebase scares me a bit haha, but it seems that they put it inside the TensorLiteralParser?
https://github.com/llvm/llvm-project/blob/d85df3f2d6e8687c44e6802dcc0e59c14ff32c9b/mlir/lib/AsmParser/AttributeParser.cpp#L456
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be in _parse_tensor_literal
, as the hex string can only be at the toplevel in MLIR
xdsl/parser/attribute_parser.py
Outdated
hex_string = self.parse_optional_str_literal() | ||
if hex_string is not None: | ||
# Can not determine values without type yet | ||
values, shape = [], None | ||
else: | ||
# Expect a tensor literal instead | ||
values, shape = self._parse_tensor_literal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have trouble understanding what hex_string
, values
, shape
are expected to be at each line here.
Can this be simplified for the case where values, shape = [], None
?
Can this be combined with the _parse_builtin_dense_attr_hex
below?
You seem to assume this is a hex, but you don't really know that yet, wouldn't determining this as fast as possible make sense and potentially elide the extra check if hex_string is not None:
below?
Ideally, I'd like to come out of the block knowing that if hex exists is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @compor I agree, that this is a bit of an ugly way of going about this.
I think it looks ugly because of the following problems I had:
- The problem is that I can not really extract the shape information out of the hex_string since the information is already flattened inside the hex attribute, so there are a few checks you can not perform on hex values.
- The added code kind of breaks the previous flow because you already need your type info to be able to parse the hex string in the first place. In the previous code this was not a problem, because you can already parse the ints and the floats purely on the format of the numbers
- another problem is that I was afraid of emitting
_TensorLiteralElement
s because they seem to require a span, and I was unsure on how this works. @superlopuh would this solve your question as well?
Does this make sense? Thanks for your help!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer your actual question @compor:
Originally:
shape = []
for splat attribute, list of ints for bigger attributes, e.g. [[2,3],[3,2]] would give a shape = [2,2]
values
= a list of _TensorLiteralElement
s which contains parsed TensorLiteral
elements
Now:
hex_string
The value of the hex string, if present, None
if not present
shape = []
for splat attribute, list of ints for bigger attributes, e.g. [[2,3],[3,2]] would give a shape = [2,2]
.
For hex attributes I can only give the flattened shape as a single-element list (e.g. shape = [4]
) because of a lack of information.
values
= a list of _TensorLiteralElement
s which contains parsed TensorLiteral
elements
In the case of a hex attribute, the values
are already in their python counterparts, so values is already of type list[int]
or list[float]
and doesn't need to be converted from TensorLiteralElement anymore.
I agree that this is a lot of info necessary for just 3 variables 😅 , any suggestions very much welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that what you're saying makes sense, our parser API might be a little awkward for this, hopefully we can make it a little less awkward to deal with it better. I'd say that the span can be the span of the string literal for all the elements for your last bullet.
@math-fehr, what are your thoughts on all this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a similar issue with this BTW, when I tried to implement memref.global
custom syntax: https://xdsl.zulipchat.com/#narrow/stream/368604-Framework/topic/printing.20and.20parsing.20attributes/near/410844828
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just put the entire span for the string, so we don't have list[int]
or list[float]
and have the list of literal elements, I would be fine with that!
We won't use these errors anyway, so there is no problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify the logic actually, can we just split the following code in two functions? One function to deal with the hex, and the other to deal with the lists?
Because currently the logic gets quite complex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @math-fehr I'll try to refactor to using the literals now and I'll see what I'm left with 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought @math-fehr it doesn't make a lot of sense to make the _TensorLiteralElements, because then you have to convert them to a list of ints going out of the function anyways :(
Okay it seems there's some further work necessary beyond your comments.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's going to the right direction, the only problem is the code getting complex now
Co-authored-by: Chris Vasiladiotis <cvassiladiotis@gmail.com>
e2377ad
to
9f85c08
Compare
xdsl/parser/attribute_parser.py
Outdated
# Get values and shape in case of hex_string (requires parsed type) | ||
if hex_string is not None: | ||
values, shape = self._parse_builtin_dense_attr_hex(hex_string, type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but what changes if this line is inlined above, and the hex_string
doesn't escape the else:
branch above? I have a feeling that this method would be easier to understand with an is_hex_string
bool
value instead of the string, just to communicate that it's not used for anything once its parsed
xdsl/parser/attribute_parser.py
Outdated
# Convert list of elements to a list of values. | ||
if shape != []: | ||
data_values = [value.to_type(self, element_type) for value in values] | ||
# For hex string, the data values are already constructed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? Isn't one of your examples a 2xi32
array where there's a single repeated value decoded?
Co-authored-by: Chris Vasiladiotis <cvassiladiotis@gmail.com>
2276424
to
6c13c64
Compare
Definitely looks much neater, thank you! |
I'd still recommend inlining the helpers that are only called once in your new version |
Here's what I came to after a bit of fiddling, it's not exactly what I recommended before, but also seems like a direction to look into: |
@superlopuh I like your patch! Can you add it to this PR? |
pushed! @math-fehr, can you please review my latest patch? |
Got the ok from Mathieu, thank you very much @JosseVanDelm ! |
Attempt at solving #1971
This was a bit more tricky than I initially expected due to endianess issues.
Some things I noticed going back and forth between
xdsl-opt
andmlir-opt
:mlir-opt
has the tendency to print signless values as signed, whereas xdsl seems to print them as unsigned.here i'm adding the integer values by their signed value, is that okay?
mlir-opt
prints floats slightly different from xdsl, i made sure the values are same-ish manuallyCurrently still draft:
I'm leaving other float types for future work, these seem not possible to be unpacked with python's
struct
package.