Skip to content
Go implementation of BLAS (Basic Linear Algebra Subprograms)
Assembly Go Shell
Latest commit 5854889 Nov 12, 2013 @ziutek Merge pull request #3 from garyburd/patch-1
Update documentation link in README.md
Failed to load latest commit information.
.gitignore
LICENSE
README.md Update documentation link in README.md Nov 11, 2013
common.go
d_test.go
dasum.go Makefile removed Mar 6, 2012
dasum_amd64.s
daxpy.go
daxpy_amd64.s
dcopy.go
dcopy_amd64.s
ddot.go
ddot_amd64.s
dgemv.go
dnrm2.go
dnrm2_amd64.s
doc.go
drot.go
drot_amd64.s
drotg.go
drotg_amd64.s
drotmg.go
dscal.go
dscal_amd64.s
dswap.go
dswap_amd64.s
idamax.go
idamax_amd64-simd_broken
idamax_amd64.s Fix for go1.1 64-bit ints and pointers on amd64 Jun 3, 2013
isamax.go
isamax_amd64.s
s_test.go
sasum.go
sasum_amd64.s
saxpy.go
saxpy_amd64.s
scopy.go
scopy_amd64.s
sdot.go
sdot_amd64.s
sdsdot.go
sdsdot_amd64.s
simd.txt
snrm2.go
snrm2_amd64.s
srot.go
srot_amd64.s
srotg.go
srotg_amd64.s
sscal.go
sscal_amd64.s
sswap.go
sswap_amd64.s
stubs.bash
stubs_386.s Makefile removed Mar 6, 2012
stubs_arm.s

README.md

Go implementation of BLAS (Basic Linear Algebra Subprograms)

Any function is implemented in generic Go and if it is justified, it is optimized for AMD64 (using SSE2 instructions).

AMD64 implementation uses MOVUPS/MOVUPD instructions if all strides equal to 1 so it run fast on Nehalem, Sandy Bridge and newer processors but relatively slow on older processors.

Any implemented function has its own unity test and benchmark.

Implemented functions

Level 1

Sdsdot, Sdot, Ddot, Snrm2, Dnrm2, Sasum, Dasum, Isamax, Idamax, Sswap, Dswap, Scopy, Dcopy, Saxpy, Daxpy, Sscal, Dscal, Srotg, Drotg, Srot, Drot

Level 2

not implemented

Level 3

not implemented

Example benchmarks

FunctionGeneric GoOptimized for AMD64
Ddot2825 ns/op895 ns/op
Dnrm22787 ns/op597 ns/op
Dasum3145 ns/op560 ns/op
Sdsdot3133 ns/op1733 ns/op
Sdot2832 ns/op508 ns/op

Documentation

http://godoc.org/github.com/ziutek/blas

Something went wrong with that request. Please try again.