Summary
Follow up on #1 and #2 by explicitly tracking the downstream-required complex math surface for the CUDA backend.
#1 and #2 establish the main direction:
- interleaved
Complex32 / Complex64
- first-class IR type support
- CUDA lowering via
thrust::complex<T>
- baseline frontend / macro integration
That is the right architectural foundation.
The next step should be tracked separately: make the complex transcendental / math operator surface explicitly complete and reliable for downstream GPU tensor runtimes.
For downstream users such as tensor4all/tenferro-rs, the practical requirement is:
abs
exp
log
sin
cos
tanh
sqrt
pow
Why this deserves its own issue
The broad complex-number issue (#1) is about making interleaved complex values a first-class CubeCL concept.
This issue is narrower and more execution-oriented:
- audit which complex math ops already work cleanly through the current IR + CUDA codegen path
- fill the remaining gaps, especially
tanh and pow if they are not already complete
- make the supported complex math contract explicit enough that downstream runtimes can remove their current restrictions
In other words:
#1 / #2 = "complex numbers exist in CubeCL"
- this issue = "the CUDA backend has a complete, documented, tested complex math surface needed by downstream tensor backends"
Downstream motivation
tensor4all/tenferro-rs currently tracks this gap in:
tensor4all/tenferro-rs#724
That downstream currently restricts complex GPU execution because it cannot yet rely on full complex math coverage for elementwise kernels / fusion.
The goal of this issue is to make CubeCL the single source of truth for those operations, so downstreams do not need ad hoc decompositions or per-project workarounds.
Required operator set
For Complex32 and Complex64, the CUDA backend should provide a clear support story for:
Must support
abs
exp
log
sin
cos
tanh
sqrt
pow
- existing baseline arithmetic:
add, sub, mul, div, neg, conj
Explicitly unsupported / rejected
These should remain rejected centrally with a clear policy, not left ambiguous:
- ordering comparisons:
<, <=, >, >=
- ordering-based min/max semantics if no mathematically sound complex policy is intended
- bitwise ops
- integer-only saturating / high-word arithmetic
Design expectations
The implementation should stay aligned with the clean architecture introduced in #1 / #2.
That means:
- no downstream-specific hacks
- no tenferro-specific decomposition logic pushed into CubeCL
- no split-complex workaround
- keep interleaved complex as the primary model
- keep invalid operations rejected in central validation, not in scattered backend code
CUDA backend guidance
Use thrust::complex<float> / thrust::complex<double> wherever possible.
For each op, one of these should be made explicit:
- supported directly via
thrust overloads / functions
- implemented via a small custom
__device__ helper in CUDA codegen
- intentionally rejected as unsupported for complex
Examples:
exp(z)
log(z)
sin(z)
cos(z)
tanh(z)
sqrt(z)
pow(z, w)
abs(z) returning the expected real-valued magnitude or a clearly documented complex-backend convention
The issue is not only whether each formula exists in principle, but whether the current CubeCL IR/frontend/codegen pipeline can express it cleanly and consistently.
Suggested work breakdown
1. Audit the current #2 implementation
For each required op above, confirm whether it is already:
- represented in IR/frontend
- lowered correctly by CUDA codegen
- type-checked correctly for complex operands/results
- tested for both
Complex32 and Complex64
2. Fill the remaining operator gaps
Likely follow-up candidates include at least:
If any of abs, exp, log, sin, cos, sqrt are still partial or only implicitly supported, make them explicit and test-backed.
3. Clarify result-type policy
Document and enforce the return-type contract for operations like:
abs(complex)
real(complex)
imag(complex)
Downstreams need to know whether these return real scalars, complex values with zero imaginary part, or something else.
4. Strengthen tests
Add focused runtime tests for:
Complex32
Complex64
- representative nontrivial values, not just purely real inputs
- branch-sensitive cases for
log, sqrt, and pow
- agreement with a host reference such as
num_complex
Acceptance criteria
This issue is done when all of the following are true:
- The CUDA backend has an explicit supported-complex-math contract for the operator set listed above.
Complex32 and Complex64 both have runtime coverage for the supported ops.
- Remaining invalid complex ops are rejected centrally and intentionally.
- A downstream such as tenferro can point to CubeCL as the authoritative provider of these complex math ops instead of carrying project-local restrictions.
- Any remaining exclusions are documented clearly in the issue or follow-up issues.
Non-goals
- WGPU backend completion
- CPU backend completion
- complex GEMM / linalg in CubeCL itself
- downstream-specific workarounds in tenferro or other consumers
Those can stay in separate issues.
Related
Summary
Follow up on #1 and #2 by explicitly tracking the downstream-required complex math surface for the CUDA backend.
#1 and #2 establish the main direction:
Complex32/Complex64thrust::complex<T>That is the right architectural foundation.
The next step should be tracked separately: make the complex transcendental / math operator surface explicitly complete and reliable for downstream GPU tensor runtimes.
For downstream users such as
tensor4all/tenferro-rs, the practical requirement is:absexplogsincostanhsqrtpowWhy this deserves its own issue
The broad complex-number issue (#1) is about making interleaved complex values a first-class CubeCL concept.
This issue is narrower and more execution-oriented:
tanhandpowif they are not already completeIn other words:
#1/#2= "complex numbers exist in CubeCL"Downstream motivation
tensor4all/tenferro-rscurrently tracks this gap in:tensor4all/tenferro-rs#724That downstream currently restricts complex GPU execution because it cannot yet rely on full complex math coverage for elementwise kernels / fusion.
The goal of this issue is to make CubeCL the single source of truth for those operations, so downstreams do not need ad hoc decompositions or per-project workarounds.
Required operator set
For
Complex32andComplex64, the CUDA backend should provide a clear support story for:Must support
absexplogsincostanhsqrtpowadd,sub,mul,div,neg,conjExplicitly unsupported / rejected
These should remain rejected centrally with a clear policy, not left ambiguous:
<,<=,>,>=Design expectations
The implementation should stay aligned with the clean architecture introduced in #1 / #2.
That means:
CUDA backend guidance
Use
thrust::complex<float>/thrust::complex<double>wherever possible.For each op, one of these should be made explicit:
thrustoverloads / functions__device__helper in CUDA codegenExamples:
exp(z)log(z)sin(z)cos(z)tanh(z)sqrt(z)pow(z, w)abs(z)returning the expected real-valued magnitude or a clearly documented complex-backend conventionThe issue is not only whether each formula exists in principle, but whether the current CubeCL IR/frontend/codegen pipeline can express it cleanly and consistently.
Suggested work breakdown
1. Audit the current #2 implementation
For each required op above, confirm whether it is already:
Complex32andComplex642. Fill the remaining operator gaps
Likely follow-up candidates include at least:
tanhpowIf any of
abs,exp,log,sin,cos,sqrtare still partial or only implicitly supported, make them explicit and test-backed.3. Clarify result-type policy
Document and enforce the return-type contract for operations like:
abs(complex)real(complex)imag(complex)Downstreams need to know whether these return real scalars, complex values with zero imaginary part, or something else.
4. Strengthen tests
Add focused runtime tests for:
Complex32Complex64log,sqrt, andpownum_complexAcceptance criteria
This issue is done when all of the following are true:
Complex32andComplex64both have runtime coverage for the supported ops.Non-goals
Those can stay in separate issues.
Related