Skip to content

follow-up: complete complex transcendental/math coverage for CUDA backend (abs, exp, log, sin, cos, tanh, sqrt, pow) #4

@shinaoka

Description

@shinaoka

Summary

Follow up on #1 and #2 by explicitly tracking the downstream-required complex math surface for the CUDA backend.

#1 and #2 establish the main direction:

  • interleaved Complex32 / Complex64
  • first-class IR type support
  • CUDA lowering via thrust::complex<T>
  • baseline frontend / macro integration

That is the right architectural foundation.

The next step should be tracked separately: make the complex transcendental / math operator surface explicitly complete and reliable for downstream GPU tensor runtimes.

For downstream users such as tensor4all/tenferro-rs, the practical requirement is:

  • abs
  • exp
  • log
  • sin
  • cos
  • tanh
  • sqrt
  • pow

Why this deserves its own issue

The broad complex-number issue (#1) is about making interleaved complex values a first-class CubeCL concept.

This issue is narrower and more execution-oriented:

  • audit which complex math ops already work cleanly through the current IR + CUDA codegen path
  • fill the remaining gaps, especially tanh and pow if they are not already complete
  • make the supported complex math contract explicit enough that downstream runtimes can remove their current restrictions

In other words:

  • #1 / #2 = "complex numbers exist in CubeCL"
  • this issue = "the CUDA backend has a complete, documented, tested complex math surface needed by downstream tensor backends"

Downstream motivation

tensor4all/tenferro-rs currently tracks this gap in:

  • tensor4all/tenferro-rs#724

That downstream currently restricts complex GPU execution because it cannot yet rely on full complex math coverage for elementwise kernels / fusion.

The goal of this issue is to make CubeCL the single source of truth for those operations, so downstreams do not need ad hoc decompositions or per-project workarounds.

Required operator set

For Complex32 and Complex64, the CUDA backend should provide a clear support story for:

Must support

  • abs
  • exp
  • log
  • sin
  • cos
  • tanh
  • sqrt
  • pow
  • existing baseline arithmetic: add, sub, mul, div, neg, conj

Explicitly unsupported / rejected

These should remain rejected centrally with a clear policy, not left ambiguous:

  • ordering comparisons: <, <=, >, >=
  • ordering-based min/max semantics if no mathematically sound complex policy is intended
  • bitwise ops
  • integer-only saturating / high-word arithmetic

Design expectations

The implementation should stay aligned with the clean architecture introduced in #1 / #2.

That means:

  • no downstream-specific hacks
  • no tenferro-specific decomposition logic pushed into CubeCL
  • no split-complex workaround
  • keep interleaved complex as the primary model
  • keep invalid operations rejected in central validation, not in scattered backend code

CUDA backend guidance

Use thrust::complex<float> / thrust::complex<double> wherever possible.

For each op, one of these should be made explicit:

  1. supported directly via thrust overloads / functions
  2. implemented via a small custom __device__ helper in CUDA codegen
  3. intentionally rejected as unsupported for complex

Examples:

  • exp(z)
  • log(z)
  • sin(z)
  • cos(z)
  • tanh(z)
  • sqrt(z)
  • pow(z, w)
  • abs(z) returning the expected real-valued magnitude or a clearly documented complex-backend convention

The issue is not only whether each formula exists in principle, but whether the current CubeCL IR/frontend/codegen pipeline can express it cleanly and consistently.

Suggested work breakdown

1. Audit the current #2 implementation

For each required op above, confirm whether it is already:

  • represented in IR/frontend
  • lowered correctly by CUDA codegen
  • type-checked correctly for complex operands/results
  • tested for both Complex32 and Complex64

2. Fill the remaining operator gaps

Likely follow-up candidates include at least:

  • tanh
  • pow

If any of abs, exp, log, sin, cos, sqrt are still partial or only implicitly supported, make them explicit and test-backed.

3. Clarify result-type policy

Document and enforce the return-type contract for operations like:

  • abs(complex)
  • real(complex)
  • imag(complex)

Downstreams need to know whether these return real scalars, complex values with zero imaginary part, or something else.

4. Strengthen tests

Add focused runtime tests for:

  • Complex32
  • Complex64
  • representative nontrivial values, not just purely real inputs
  • branch-sensitive cases for log, sqrt, and pow
  • agreement with a host reference such as num_complex

Acceptance criteria

This issue is done when all of the following are true:

  • The CUDA backend has an explicit supported-complex-math contract for the operator set listed above.
  • Complex32 and Complex64 both have runtime coverage for the supported ops.
  • Remaining invalid complex ops are rejected centrally and intentionally.
  • A downstream such as tenferro can point to CubeCL as the authoritative provider of these complex math ops instead of carrying project-local restrictions.
  • Any remaining exclusions are documented clearly in the issue or follow-up issues.

Non-goals

  • WGPU backend completion
  • CPU backend completion
  • complex GEMM / linalg in CubeCL itself
  • downstream-specific workarounds in tenferro or other consumers

Those can stay in separate issues.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions