Skip to content

feat: add interleaved complex number support (Complex32/64)#2

Merged
shinaoka merged 20 commits into
mainfrom
feat/complex-numbers
Apr 17, 2026
Merged

feat: add interleaved complex number support (Complex32/64)#2
shinaoka merged 20 commits into
mainfrom
feat/complex-numbers

Conversation

@shinaoka
Copy link
Copy Markdown
Member

Summary

Add first-class interleaved complex number (Complex<f32> / Complex<f64>) support to CubeCL IR, CUDA backend, and frontend.

  • IR: ComplexKind { C32, C64 } + ElemType::Complex(ComplexKind) variant
  • CUDA backend: CF32/CF64 mapped to thrust::complex<float/double>, binary/unary ops work via thrust operator overloading
  • Frontend: Independent Complex trait (parallel to Float/Int), arithmetic/transcendental expand macros, Conj/Real/Imag IR ops
  • CubeElement: Pod/CubeElement impl for num_complex::Complex<f32/f64>
  • #[cube] macro: Complex recognized as trait bound
  • Runtime tests: testgen_complex! macro with add/mul/conj tests

Design doc

docs/plans/2026-04-14-complex-design.md

Scope

Elementwise ops (add/sub/mul/div, abs/exp/log/sin/cos/sqrt, conj), structural ops, reduction. WGPU/CPU backends deferred. Batched complex GEMM tracked in separate issue.

Test plan

cargo check --workspace  # passes
cargo test -p cubecl-cuda test_complex  # requires CUDA GPU

ArthurBrussee and others added 7 commits April 14, 2026 09:30
Add CudaServer::raw_stream() to get the raw CUstream for a given
StreamId, enabling external libraries (cuBLAS/cuSOLVER/cuTENSOR) to
execute on the same CUDA stream without event synchronization.

Add ffi_interop module re-exporting CudaServer, GpuResource, and
Stream for downstream crates that need zero-copy FFI bridge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Provides a general-purpose escape hatch to execute closures with
mutable access to the underlying ComputeServer. This enables
backend-specific operations like extracting raw CUDA streams for
FFI interop without needing backend-specific methods on ComputeClient.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
thrust/complex.h requires C++ standard headers (<cmath>, <complex>)
unavailable in NVRTC. Switch to cuComplex.h (C API) with inline
operator overloading wrappers for +, -, *, /, ==, !=, unary -.

- cuFloatComplex replaces thrust::complex<float>
- cuDoubleComplex replaces thrust::complex<double>
- conj/real/imag use .x/.y struct fields
- IsNan/IsInf use .x/.y instead of .real()/.imag()

All 4 complex runtime tests pass on CUDA 12.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ConstantValue::Complex(f64, f64) variant with ordering/hash/display/cast
- From<Complex<f32/f64>> for Variable via impl_into_variable
- from_const! for Complex<f32>, Complex<f64>
- IntoRuntime: complex scalars now lower through native constants
- CubePrimitive::from_const_value: reconstruct Complex from ConstantValue
- CUDA C++ codegen: make_cuFloatComplex/make_cuDoubleComplex for constants
- Scalar packing in launcher.rs for 16-byte complex types
- Wide grid-constant scalar handling in kernel.rs + server.rs
- Runtime test: kernel_complex_scale with cf32/cf64 scalar binding

All existing complex tests pass plus new constant/scalar tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants