feat: add interleaved complex number support (Complex32/64) by shinaoka · Pull Request #2 · tensor4all/cubecl

shinaoka · 2026-04-14T00:21:36Z

Summary

Add first-class interleaved complex number (Complex<f32> / Complex<f64>) support to CubeCL IR, CUDA backend, and frontend.

IR: ComplexKind { C32, C64 } + ElemType::Complex(ComplexKind) variant
CUDA backend: CF32/CF64 mapped to thrust::complex<float/double>, binary/unary ops work via thrust operator overloading
Frontend: Independent Complex trait (parallel to Float/Int), arithmetic/transcendental expand macros, Conj/Real/Imag IR ops
CubeElement: Pod/CubeElement impl for num_complex::Complex<f32/f64>
#[cube] macro: Complex recognized as trait bound
Runtime tests: testgen_complex! macro with add/mul/conj tests

Design doc

docs/plans/2026-04-14-complex-design.md

Scope

Elementwise ops (add/sub/mul/div, abs/exp/log/sin/cos/sqrt, conj), structural ops, reduction. WGPU/CPU backends deferred. Batched complex GEMM tracked in separate issue.

Test plan

cargo check --workspace  # passes
cargo test -p cubecl-cuda test_complex  # requires CUDA GPU

…] macro

…#1289)

Add CudaServer::raw_stream() to get the raw CUstream for a given StreamId, enabling external libraries (cuBLAS/cuSOLVER/cuTENSOR) to execute on the same CUDA stream without event synchronization. Add ffi_interop module re-exporting CudaServer, GpuResource, and Stream for downstream crates that need zero-copy FFI bridge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provides a general-purpose escape hatch to execute closures with mutable access to the underlying ComputeServer. This enables backend-specific operations like extracting raw CUDA streams for FFI interop without needing backend-specific methods on ComputeClient. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

thrust/complex.h requires C++ standard headers (<cmath>, <complex>) unavailable in NVRTC. Switch to cuComplex.h (C API) with inline operator overloading wrappers for +, -, *, /, ==, !=, unary -. - cuFloatComplex replaces thrust::complex<float> - cuDoubleComplex replaces thrust::complex<double> - conj/real/imag use .x/.y struct fields - IsNan/IsInf use .x/.y instead of .real()/.imag() All 4 complex runtime tests pass on CUDA 12.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- ConstantValue::Complex(f64, f64) variant with ordering/hash/display/cast - From<Complex<f32/f64>> for Variable via impl_into_variable - from_const! for Complex<f32>, Complex<f64> - IntoRuntime: complex scalars now lower through native constants - CubePrimitive::from_const_value: reconstruct Complex from ConstantValue - CUDA C++ codegen: make_cuFloatComplex/make_cuDoubleComplex for constants - Scalar packing in launcher.rs for 16-byte complex types - Wide grid-constant scalar handling in kernel.rs + server.rs - Runtime test: kernel_complex_scale with cf32/cf64 scalar binding All existing complex tests pass plus new constant/scalar tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

nathanielsimard and others added 9 commits April 13, 2026 09:49

Feat/vector sum (tracel-ai#1286)

bf02c69

Fix UB in arena dropping (tracel-ai#1287)

438d733

Fix performance regression rocm (tracel-ai#1284)

1568a84

docs: add complex number support design

b51eae0

docs: add complex implementation plan

c011596

feat: add ComplexKind and ElemType::Complex to IR

532a322

feat: add CF32/CF64 complex element support in CUDA backend

8ea4d2e

feat: add CubeElement impl for Complex32/64

e7bc3df

feat: add Complex frontend, Conj/Real/Imag ops, runtime tests, #[cube…

03ace97

…] macro

shinaoka mentioned this pull request Apr 14, 2026

feat: batched complex GEMM (C = alpha*A*B + beta*C) #3

Open

ArthurBrussee and others added 7 commits April 14, 2026 09:30

Fix one case of unsoundness, and two other potential bugs. (tracel-ai…

b23caf4

…#1289)

Fix/tuner group (tracel-ai#1291)

34625c9

Simplify async tuning (tracel-ai#1292)

33390d5

This was referenced Apr 17, 2026

follow-up: complete complex transcendental/math coverage for CUDA backend (abs, exp, log, sin, cos, tanh, sqrt, pow) #4

Closed

cubecl fork: add complex math functions (exp, log, sin, cos, tanh, sqrt, abs, pow) tensor4all/tenferro-rs#724

Closed

shinaoka added 4 commits April 17, 2026 10:14

Add complex CUDA math helpers and tests

9163c50

docs: add complex math resume notes

3a578b0

test: add cf64 complex coverage

2691630

Merge upstream/main into feat/complex-numbers

282fe5d

shinaoka merged commit f3fc4ed into main Apr 17, 2026

shinaoka mentioned this pull request Apr 18, 2026

Add interleaved complex number support (Complex32/Complex64) to CubeCL #1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add interleaved complex number support (Complex32/64)#2

feat: add interleaved complex number support (Complex32/64)#2
shinaoka merged 20 commits into
mainfrom
feat/complex-numbers

shinaoka commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shinaoka commented Apr 14, 2026

Summary

Design doc

Scope

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants