v0.10.0-pre.3
Pre-release
Pre-release
What's Changed
- fix: Remove optimized casts because it's not supported (#1269) @wingertge
- fix: Remove
const __restrict__from atomic pointers in CUDA (#1273) @wingertge - feat: Add
Validateexecution mode (#1268) @wingertge - feat(wgpu): support zero-sized resources (#1256) @ArthurBrussee
- Fix/cuda err all reduce (#1266) @Charles23R
- fix: Improve portability of Vulkan compiler (#1263) @wingertge
- fix: Use out item for atomic index so it works properly on Metal (#1265) @wingertge
- Try all options as fallback when autotuning (#1247) @ArthurBrussee
- fix: Fix metal again, make features not mutually exclusive (#1262) @wingertge
- fix: Fix metal compile error (#1261) @wingertge
- feat: Atomic vector (#1253) @wingertge
- Adds arena + refactor stream id (#1259) @nathanielsimard
- Document unsafe code in cubecl-hip/cubecl-cuda (#1258) @nathanielsimard
- Fix GPU hangs on integrated AMD GPUs by increasing drop queue flush frequency (#1257) @nathanielsimard
- Replace bincode with ciborium for compilation cache (#1254) @Veercodeprog
- Fix: defer CPU staging buffer drops with PendingDropQueue (#1255) @nathanielsimard
- chore: Clean up
MetadataBindingInfo(#1248) @wingertge - refactor: Scalars/Metadata (#1244) @wingertge
- Switch to effective_size (#1245) @nathanielsimard
- fix(cubecl-runtime): PersistentPool HashMap key mismatch and reuse safety (#1241) @Veercodeprog
- remove nonexistant field (#1242) @louisfd
- fix(wgpu): flush staging buffers periodically during bulk writes (#1204) @holg
- Fix 7 more cases of UB, fix flaky test (#1238) @ArthurBrussee
- feat: gitignore .DS_Store (#1240) @syl20bnr
- Fix UB in memory handle location, fix cloning CubeCount::Dynamic (#1239) @ArthurBrussee
- refactor: Merge
compilation_argandregister(#1237) @wingertge - chore: Update to wgpu v29, enable 64-bit buffers for Vulkan (#1236) @wingertge
- revert removing f32 atomic from metal (#1235) @louisfd
- Nccl all reduce (#1226) @Charles23R
- Remove atomic ptr (#1228) @nathanielsimard
- feat: Allow view layouts to infer launch info from buffer metadata (#1231) @wingertge
- rm f32 float from metal (#1233) @louisfd
- refactor: Rename and refactor dynamic types (#1229) @wingertge
- refactor: Line size generic (#1221) @wingertge
- Fix multiple bugs (#1225) @nathanielsimard
- Remove critical section (#1223) @nathanielsimard
- fix: Fix for loops with breaks (#1222) @paulzhng
- Fix: Benchmarking and Profiling (#1220) @nathanielsimard
- Fix/memory management (#1214) @nathanielsimard
- Fix wasm compilation error (#1206) @ArthurBrussee
- feat: Runtime enum (#1208) @wingertge
- Mma inplace version & 16x8x8 support (#1213) @louisfd
- Fix/no std device + improve channel device handle performance (#1209) @nathanielsimard
- Refactor device communication channel (#1199) @nathanielsimard
Full Changelog: v0.10.0-pre.2...v0.10.0-pre.3