New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define the maximum number of operand dimensions (maximum rank) #456
Comments
Yeah, we really could use better diagnosability of graph failure creation related to backend limitations, either:
|
Do we know what's the max dims required the models we definately want to support (e.g. media pipe, mobilenet, diffusion/llm)? Perhaps that's a good middle ground for now. I guess most models are fine with Exposing limits early to clients is okay, we can expose the limit on MLContext (before graph building). Based on the current spec, MLContext is required to use MLGraphBuilder. I think it's a reasonable place. Taking a step back, I'd prefer we limit max_dims to the "greatest common divisor" across the backends we want to support (referring to Google's feedback on the API: #453 : XNNPACK, DML<3, Apple M1/M2, upcoming Intel VPU). Progressively adding features (i.e. higher dims) is much easier than asking API users feature detect / handle failures from the beginning. If we don't want to limit max_dims the spec, can we at least provide a guideline (e.g. for best interoperability, don't use dimensions larger than X) based on our survey? Me as a naive developer: knowing the model can run on the backend before downloading a multi-GB weight file is useful (don't waste bandwidth on things that can't be used). |
📚 @huningxin: Adding a few more references:
I would ignore older DML versions before 3.0, because WebNN needs the DML_GRAPH anyway.
🌍 @wacky6: The largest “real world” models we’ve seen have 7 dimensions, a few have 6, many have 5, and of course the rest have 4 (I have no idea what model would use 12 dimensions or 32 though, which seems excessive given the tensor dimensions would yield a rapidly exponential element count). Given these models and considering GPU architecture (where a natural vector size for many GPU's is 4 x 32 bits), a reasonable upper limit would be 8D which fits nicely as two uint32_t4’s, which coincidentally DirectML and cuDNN and BNNS settled on.
🤔 Note the limitations of a backend need not completely constrain the frontend though. There will be differing backend limitations, but it turns out that because WebNN does not support arbitrary tensor strides anyway, and all elements are contiguous in memory, one can fold higher dimensional input into lower dimensional input. For example, any elementwise ND operation (add, relu, elementwiseIf...) is treatable as a large 1D array, and similar folding logic is applicable to nearly every other class of operator. Pure elementwise operators can be flattened to a simple 1D array:
Operators taking a single axis can be collapsed into [collapsed left side, middle axis, collapsed right side]:
Operators with split points can flatten all dimensions into two partitions before and after the axis split point:
Operators with multiple axes can coalesce adjacent axes:
Then there are operators where some dimensions are fixed, but all the rest can be flattened (e.g. BatchNorm locks the first two axes but flattens all other dimensions to the right, whereas GEMM and Softmax lock the right two but flatten all the leading batch dimensions to the left). All of these are just reshapes and operator description adjustments, no tensor memory copy needed. Now some operators are not achievable via a single simple reshape from say 7D to 4D, like a 7D Tile or Resample with non-collapsible repetition values, but they can still be collapsed and implemented in terms of lower dimensional implementations with just two 4D calls. Then pretty much everything else (except potentially ND gather and scatter 🤨) can be implemented in terms of transpose plus more than one call of that operator. Some background experience... Originally because the earliest versions of DirectML were limited to only 4 dimensions (and 5 in the rare case of 3D convolution), we needed to implement this dimension collapsing logic in the TensorFlow fork atop DirectML. Later this kind of logic was moved directly into DirectML so any API caller could get up to 8D. Interestingly XNNPack at 6D is only 2 dimensions away from most of the PACK (BNNS, DML, cuDNN 😉), but then XNNPack technically already supports any elementwise operators with contiguous memory tensors of 32 dimensions, if one just reinterprets it as a 1D array before calling XNNPack 😎.
@wacky6: So the caller can avoid wasted time and energy (or like you mention, download cost) building a graph that is doomed to fail later anyway, I would expose dimensionality limits earlier where possible per MLContext (similar to WebGPU's limits concept). Now, there are still cases where individual operators may not support up to that general maximum (for example, convolution is typically 4D or 5D, even if the maximum operator limit is 8D in general), meaning failure could still occur later. So, reporting limits early doesn't completely obviate the use for good error reporting after graph construction/execution. Additionally (although not the topic of this post), other matters like absent data type support could cause a failure. So, I suppose my (a)/(b)/(c) above are not strictly orthogonal. |
FYI, we plan to increase |
* Bug fix: Fix MLGraphBuilder.input()'s handling of scalars. Fixes #502 MLOperandDescriptor was updated in c320472 to always have dimensions, defaulted to an empty list for scalars. That makes the current prose for input() incorrect. Issue #502 already tracked correcting it, so let's simplify - just change the logic for "is a scalar?" and drop the bogus assert. * Remove conditional and fix check dimensions to allow 0 * Factor byte length check into checking dimensions * Oops, this should go too. * type -> dataType * (1) simplify (2) move into MLOperandDescriptor (3) link to #456 * Restore dimension check and reorder byte length to last * Fix build for missing dimensions --------- Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
@Maratyszcza : Is this still in mind? Spelunking all the usage sites (https://github.com/search?type=code&q=XNN_MAX_TENSOR_DIMS+repo%3Agoogle%2FXNNPack), it appears increasing 6->8 would have little collateral impact to the code, mainly increasing the size of various local variables (e.g. |
I no longer work on XNNPack, inviting @alankelly & @fbarchard to answer |
We are planning on adding support for more dimensions this year. We are working on various runtime changes now as part of which this may be integrated into. |
Regarding to the current definition of
MLOperandDescriptor
there is no definition of the maximum number of the
dimensions
sequence in spec.However, for implementation, the native ML APIs usually have the maximum supported size. For example:
And there may be per operator definitions, such as for convolution operator, the maximum dimensions count is 5, for element-wise add operator, the maximum dimensions count is 8. Thanks @fdwr sharing this information!
In a Chromium CL review, @RafaelCintron (Thanks!) mentions
Rafael also shared that WebGPU solved a similar problem with "limits" https://gpuweb.github.io/gpuweb/#limits.
The text was updated successfully, but these errors were encountered: