Release Version 6 · vosen/ZLUDA

What's Changed

Remove trailing whitespaces and add missing newlines at EOF by @trueNAHO in #530
Fix LD_AUDIT segfaulting in certain configurations by @vosen in #533
Fix post-merge tests by @vosen in #535
Update devcontainer Dockerfile, bump CUDA version to 13, split cuDNN into v8 and v9 by @vosen in #536
Regenerate LLVM tests by @vosen in #538
Add test for mma by @zluda-violet in #540
Add MIOpen and connect it to zluda_dnn by @vosen in #539
Add mma x2 test by @zluda-violet in #547
Improve performance of instruction_mode_to_global_mode compiler pass by @vosen in #544
Fix regression in instruction_mode_to_global_mode by @vosen in #552
Add more missing BLAS by @vosen in #553
Implement functionality required by katago by @vosen in #541
Fix LLVM tests failing because of a variable order of declarations by @vosen in #557
Fix bugs exposed by rust-cuda by @vosen in #563
Small fix for Rust-CUDA support by @stevefan1999-personal in #560
Build and distribute LLVM by @zluda-violet in #555
Use different HiGHS solver (fixes crashes in some ptx modules) by @vosen in #565
Emit llvm.zluda.mma intrinsic for MMA by @zluda-violet in #546
Improve quality of instruction_mode_to_global_mode pass by @vosen in #567
Implement more low precision ML instructions by @vosen in #554
Update LLVM output by @zluda-violet in #568
Implement fallback MMA for RDNA1 and RDNA2 by @vosen in #551
Various LLVM fixes and improvements by @vosen in #570
Try to pick a more appropriate ptx from fatbin by @vosen in #569
Add a precompilation tool by @vosen in #558
Support ld.global.v8 by @zluda-violet in #572
Support st.global.v8 by @zluda-violet in #573
Improve Windows loader (zluda.exe) by @vosen in #550
Add parser support for .noreturn by @zluda-violet in #575
Add createpolicy.fractional as nop by @zluda-violet in #576
Add error message for PtxError::Todo to make debugging easier by @zluda-violet in #577
Additional NVML functionality by @zluda-violet in #574
Use LLVM for optimized i8 MMAs by @vosen in #571
Enable ROCm7 support on Linux and Windows by @vosen in #579
Fix bug where ignored directives are treated as invalid by @zluda-violet in #578
Allow implicit conversion from vec to bit scalar for ld by @zluda-violet in #580
Swap type and state space in variable printing for pass tests by @zluda-violet in #581
[CLEANUP] Rename refactored modules, update test golden files by @zluda-violet in #582
Increase sccache max frame length (fixes Windows builds) by @zluda-violet in #593
Implement bmsk.clamp.b32 by @zluda-violet in #590
Update SCCACHE_MAX_FRAME_LENGTH for post-merge builds by @vosen in #594
Implement cuMemHostGetDevicePointer_v2 by @Knogle in #595
Fix conflict for initial rounding and denormal mode for non-kernel functions by @zluda-violet in #596
Implicitly convert constants from float to bit type by @zluda-violet in #583
Allow implicit conversion from bit scalar to vec for st by @zluda-violet in #585
Correctly zero-initialize globals by @vosen in #588
When building ZLUDA in CI, make sure we build Linux binaries compatible with both ROCm 6 and ROCm 7 by @vosen in #589
Add CUDA 13.1 compatibility by @vosen in #599
Stop failing on bf16 uint_to_fp on amdgpu < gfx11 by @vosen in #601
Update docs (add llama.cpp, zluda_precompile sections) by @vosen in #602
Use partial parsing result in release mode by @vosen in #603
Add sad and dp2a instructions by @vosen in #605
Host functions for vLLM by @zluda-violet in #606
Implement extended precision integer addition by @zluda-violet in #607
ctlz bug fix by @zluda-violet in #610
Fix loading CUDA modules from dark api and add a tool to verify Windows library loading by @vosen in #612
Finish extended precision arithmetic by @zluda-violet in #609
Enable handle from cublasCreate to be used in cublasLt calls by @zluda-violet in #587
Support cuMemAllocPitch_v2 and cuMemcpy2D_v2 by @vosen in #616
Add various bits and pieces required by pytorch by @vosen in #615
Support some cublaslt settings required by COEIROINK by @vosen in #619
Add minimal cuSPARSE by @vosen in #621
PyTorch fixes and improvements by @vosen in #620
Initial textures support by @vosen in #625
Add more cuSPARSE functions by @vosen in #624
Support vshr.u32.u32.u32.clamp.add by @vosen in #629
Refactor emit_brev to use emit_intrinsic helper by @hemangjoshi37a in #631
Update tests by @vosen in #632
Fix typo: vec_acccess -> vector_read in emit_vector_read by @hemangjoshi37a in #633
Remove redundant map_err(CompilerError::from) calls in compiler by @hemangjoshi37a in #635
32 bit support in the compiler by @vosen in #637
Fix typo: compatiblity -> compatibility in xtask comment by @hemangjoshi37a in #639
Fix typo: overriden -> overridden in zluda_redirect by @hemangjoshi37a in #641
Implement match.any.sync, fix popc.b64 by @vosen in #642
Fix clz.b64 and add bfind.shiftamt by @vosen in #644
Add cuDeviceGetPCIBusId by @vosen in #645
Minor improvements for PyTorch by @vosen in #643
Update compiler to ROCm 7.2 and make some minor compiler fixes by @vosen in #649
More minor compiler improvements by @vosen in #650

New Contributors

@trueNAHO made their first contribution in #530
@stevefan1999-personal made their first contribution in #560
@Knogle made their first contribution in #595
@hemangjoshi37a made their first contribution in #631

Full Changelog: v5...v6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version 6

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!