follow-up: complete complex transcendental/math coverage for CUDA backend (abs, exp, log, sin, cos, tanh, sqrt, pow)

## Summary

Follow up on #1 and #2 by explicitly tracking the **downstream-required complex math surface** for the CUDA backend.

#1 and #2 establish the main direction:
- interleaved `Complex32` / `Complex64`
- first-class IR type support
- CUDA lowering via `thrust::complex<T>`
- baseline frontend / macro integration

That is the right architectural foundation.

The next step should be tracked separately: make the complex transcendental / math operator surface **explicitly complete and reliable** for downstream GPU tensor runtimes.

For downstream users such as `tensor4all/tenferro-rs`, the practical requirement is:

- `abs`
- `exp`
- `log`
- `sin`
- `cos`
- `tanh`
- `sqrt`
- `pow`

## Why this deserves its own issue

The broad complex-number issue (#1) is about making interleaved complex values a first-class CubeCL concept.

This issue is narrower and more execution-oriented:
- audit which complex math ops already work cleanly through the current IR + CUDA codegen path
- fill the remaining gaps, especially `tanh` and `pow` if they are not already complete
- make the supported complex math contract explicit enough that downstream runtimes can remove their current restrictions

In other words:
- `#1` / `#2` = "complex numbers exist in CubeCL"
- this issue = "the CUDA backend has a complete, documented, tested complex math surface needed by downstream tensor backends"

## Downstream motivation

`tensor4all/tenferro-rs` currently tracks this gap in:
- `tensor4all/tenferro-rs#724`

That downstream currently restricts complex GPU execution because it cannot yet rely on full complex math coverage for elementwise kernels / fusion.

The goal of this issue is to make CubeCL the single source of truth for those operations, so downstreams do not need ad hoc decompositions or per-project workarounds.

## Required operator set

For `Complex32` and `Complex64`, the CUDA backend should provide a clear support story for:

### Must support
- `abs`
- `exp`
- `log`
- `sin`
- `cos`
- `tanh`
- `sqrt`
- `pow`
- existing baseline arithmetic: `add`, `sub`, `mul`, `div`, `neg`, `conj`

### Explicitly unsupported / rejected
These should remain rejected centrally with a clear policy, not left ambiguous:
- ordering comparisons: `<`, `<=`, `>`, `>=`
- ordering-based min/max semantics if no mathematically sound complex policy is intended
- bitwise ops
- integer-only saturating / high-word arithmetic

## Design expectations

The implementation should stay aligned with the clean architecture introduced in #1 / #2.

That means:
- no downstream-specific hacks
- no tenferro-specific decomposition logic pushed into CubeCL
- no split-complex workaround
- keep interleaved complex as the primary model
- keep invalid operations rejected in central validation, not in scattered backend code

## CUDA backend guidance

Use `thrust::complex<float>` / `thrust::complex<double>` wherever possible.

For each op, one of these should be made explicit:
1. supported directly via `thrust` overloads / functions
2. implemented via a small custom `__device__` helper in CUDA codegen
3. intentionally rejected as unsupported for complex

Examples:
- `exp(z)`
- `log(z)`
- `sin(z)`
- `cos(z)`
- `tanh(z)`
- `sqrt(z)`
- `pow(z, w)`
- `abs(z)` returning the expected real-valued magnitude or a clearly documented complex-backend convention

The issue is not only whether each formula exists in principle, but whether the current CubeCL IR/frontend/codegen pipeline can express it cleanly and consistently.

## Suggested work breakdown

### 1. Audit the current #2 implementation
For each required op above, confirm whether it is already:
- represented in IR/frontend
- lowered correctly by CUDA codegen
- type-checked correctly for complex operands/results
- tested for both `Complex32` and `Complex64`

### 2. Fill the remaining operator gaps
Likely follow-up candidates include at least:
- `tanh`
- `pow`

If any of `abs`, `exp`, `log`, `sin`, `cos`, `sqrt` are still partial or only implicitly supported, make them explicit and test-backed.

### 3. Clarify result-type policy
Document and enforce the return-type contract for operations like:
- `abs(complex)`
- `real(complex)`
- `imag(complex)`

Downstreams need to know whether these return real scalars, complex values with zero imaginary part, or something else.

### 4. Strengthen tests
Add focused runtime tests for:
- `Complex32`
- `Complex64`
- representative nontrivial values, not just purely real inputs
- branch-sensitive cases for `log`, `sqrt`, and `pow`
- agreement with a host reference such as `num_complex`

## Acceptance criteria

This issue is done when all of the following are true:

- The CUDA backend has an explicit supported-complex-math contract for the operator set listed above.
- `Complex32` and `Complex64` both have runtime coverage for the supported ops.
- Remaining invalid complex ops are rejected centrally and intentionally.
- A downstream such as tenferro can point to CubeCL as the authoritative provider of these complex math ops instead of carrying project-local restrictions.
- Any remaining exclusions are documented clearly in the issue or follow-up issues.

## Non-goals

- WGPU backend completion
- CPU backend completion
- complex GEMM / linalg in CubeCL itself
- downstream-specific workarounds in tenferro or other consumers

Those can stay in separate issues.

## Related

- #1
- #2
- tensor4all/tenferro-rs#724


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

follow-up: complete complex transcendental/math coverage for CUDA backend (abs, exp, log, sin, cos, tanh, sqrt, pow) #4

Summary

Why this deserves its own issue

Downstream motivation

Required operator set

Must support

Explicitly unsupported / rejected

Design expectations

CUDA backend guidance

Suggested work breakdown

1. Audit the current #2 implementation

2. Fill the remaining operator gaps

3. Clarify result-type policy

4. Strengthen tests

Acceptance criteria

Non-goals

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

follow-up: complete complex transcendental/math coverage for CUDA backend (abs, exp, log, sin, cos, tanh, sqrt, pow) #4

Description

Summary

Why this deserves its own issue

Downstream motivation

Required operator set

Must support

Explicitly unsupported / rejected

Design expectations

CUDA backend guidance

Suggested work breakdown

1. Audit the current #2 implementation

2. Fill the remaining operator gaps

3. Clarify result-type policy

4. Strengthen tests

Acceptance criteria

Non-goals

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions