feat: add interleaved complex number support (Complex32/64)#2
Merged
Conversation
Add CudaServer::raw_stream() to get the raw CUstream for a given StreamId, enabling external libraries (cuBLAS/cuSOLVER/cuTENSOR) to execute on the same CUDA stream without event synchronization. Add ffi_interop module re-exporting CudaServer, GpuResource, and Stream for downstream crates that need zero-copy FFI bridge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Provides a general-purpose escape hatch to execute closures with mutable access to the underlying ComputeServer. This enables backend-specific operations like extracting raw CUDA streams for FFI interop without needing backend-specific methods on ComputeClient. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
thrust/complex.h requires C++ standard headers (<cmath>, <complex>) unavailable in NVRTC. Switch to cuComplex.h (C API) with inline operator overloading wrappers for +, -, *, /, ==, !=, unary -. - cuFloatComplex replaces thrust::complex<float> - cuDoubleComplex replaces thrust::complex<double> - conj/real/imag use .x/.y struct fields - IsNan/IsInf use .x/.y instead of .real()/.imag() All 4 complex runtime tests pass on CUDA 12.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ConstantValue::Complex(f64, f64) variant with ordering/hash/display/cast - From<Complex<f32/f64>> for Variable via impl_into_variable - from_const! for Complex<f32>, Complex<f64> - IntoRuntime: complex scalars now lower through native constants - CubePrimitive::from_const_value: reconstruct Complex from ConstantValue - CUDA C++ codegen: make_cuFloatComplex/make_cuDoubleComplex for constants - Scalar packing in launcher.rs for 16-byte complex types - Wide grid-constant scalar handling in kernel.rs + server.rs - Runtime test: kernel_complex_scale with cf32/cf64 scalar binding All existing complex tests pass plus new constant/scalar tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add first-class interleaved complex number (
Complex<f32>/Complex<f64>) support to CubeCL IR, CUDA backend, and frontend.ComplexKind { C32, C64 }+ElemType::Complex(ComplexKind)variantCF32/CF64mapped tothrust::complex<float/double>, binary/unary ops work via thrust operator overloadingComplextrait (parallel toFloat/Int), arithmetic/transcendental expand macros,Conj/Real/ImagIR opsCubeElement:Pod/CubeElementimpl fornum_complex::Complex<f32/f64>#[cube]macro:Complexrecognized as trait boundtestgen_complex!macro with add/mul/conj testsDesign doc
docs/plans/2026-04-14-complex-design.mdScope
Elementwise ops (add/sub/mul/div, abs/exp/log/sin/cos/sqrt, conj), structural ops, reduction. WGPU/CPU backends deferred. Batched complex GEMM tracked in separate issue.
Test plan